diff --git a/AGENTS.md b/AGENTS.md index 2b6eb3c2..048219da 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -176,3 +176,56 @@ Do / Don’t(壊れやすいパターンの禁止) 運用の心得 - 下層(Remote/Ownership)に疑義がある間は、上層(Publish/Adopt)を “無理に” 積み増さない。 - 変更は常に A/B ガード付きで導入し、SIGUSR2/リングとワンショットログで芯を掴んでから上に進む。 + +--- + +## 健康診断ランと注意事項(Superslab / madvise / Pool 用) + +このリポジトリは Superslab / madvise / Pool v1 flatten など OS 依存の経路を多用します。 +「いつの間にか壊れていた」を防ぐために、次の“健康診断ラン”と注意事項を守ってください。 + +- DSO 領域には触らない(Superslab OS Box のフェンス) + - `core/box/ss_os_acquire_box.h` の `ss_os_madvise_guarded()` は **libc/libm/ld.so など DSO 領域を dladdr で検出したら即スキップ** します。 + - DSO に対する madvise 試行は **バグ扱い**。`g_ss_madvise_disabled` / DSO-skip ログを必ず 1 回だけ出し、以降は触らない前提です。 + - 開発/CI では(必要なら)`HAKMEM_SS_MADVISE_DSO_FAILFAST=1` を使って、「DSO に一度でも触ろうとしたら即 abort」するチェックランを追加してください。 + +- madvise / vm.max_map_count 用 健康診断ラン + - 目的: Superslab OS Box が ENOMEM(vm.max_map_count) に達しても安全に退避できているか、DSO 領域を誤って触っていないかを確認する。 + - 推奨コマンド(C7_SAFE + mid/smallmid, Superslab/madvise 経路の smoke 用): + ```sh + HAKMEM_BENCH_MIN_SIZE=257 \ + HAKMEM_BENCH_MAX_SIZE=768 \ + HAKMEM_TINY_HEAP_PROFILE=C7_SAFE \ + HAKMEM_TINY_C7_HOT=1 \ + HAKMEM_TINY_HOTHEAP_V2=0 \ + HAKMEM_SMALL_HEAP_V3_ENABLED=1 \ + HAKMEM_SMALL_HEAP_V3_CLASSES=0x80 \ + HAKMEM_POOL_V2_ENABLED=0 \ + HAKMEM_POOL_V1_FLATTEN_ENABLED=0 \ + HAKMEM_SS_OS_STATS=1 \ + ./bench_mid_large_mt_hakmem 5000 256 1 + ``` + - チェックポイント: + - 終了時に `[SS_OS_STATS] ... madvise_enomem=0 madvise_disabled=0` が理想(環境次第で ENOMEM は許容、ただし disabled=1 になっていれば以降の madvise は止まっている)。 + - DSO-skip や DSO Fail-Fast ログが出ていないこと(出た場合は ptr 分類/経路を優先的にトリアージ)。 + +- Pool v1 flatten のプロファイル注意 + - LEGACY プロファイル専用の最適化です。`HAKMEM_TINY_HEAP_PROFILE=C7_SAFE` / `C7_ULTRA_BENCH` のときは **コード側で強制OFF** されます。 + - flatten を触るときの健康診断ラン(LEGACY想定): + ```sh + HAKMEM_BENCH_MIN_SIZE=257 \ + HAKMEM_BENCH_MAX_SIZE=768 \ + HAKMEM_TINY_HEAP_PROFILE=LEGACY \ + HAKMEM_POOL_V2_ENABLED=0 \ + HAKMEM_POOL_V1_FLATTEN_ENABLED=1 \ + HAKMEM_POOL_V1_FLATTEN_STATS=1 \ + ./bench_mid_large_mt_hakmem 1 1000000 400 1 + ``` + - チェックポイント: + - `[POOL_V1_FLAT] alloc_tls_hit` / `free_tls_hit` が増えていること(flatten 経路が効いている)。 + - `free_fb_*`(page_null / not_mine / other)は**少数**に収まっていること。増えてきたら owner 判定/lookup 側を優先トリアージする。 + +- 一般ルール(壊れたらまず健康診断ラン) + - Tiny / Superslab / Pool に手を入れたあと、まず上記の健康診断ランを 1 回だけ回してから長尺ベンチ・本番 A/B に進んでください。 + - 健康診断ランが落ちる場合は **新しい最適化を積む前に** Box 境界(ptr 分類 / Superslab OS Box / Pool v1 flatten Box)を優先的に直します。 + - ベンチや評価を始めるときは、`docs/analysis/ENV_PROFILE_PRESETS.md` のプリセット(MIXED_TINYV3_C7_SAFE / C6_HEAVY_LEGACY_POOLV1 / DEBUG_TINY_FRONT_PERF)から必ずスタートし、追加した ENV はメモを残してください。単発の ENV を散らすと再現が難しくなります。 diff --git a/CURRENT_TASK.md b/CURRENT_TASK.md index d9c0e03f..8016526a 100644 --- a/CURRENT_TASK.md +++ b/CURRENT_TASK.md @@ -1,10 +1,44 @@ ## HAKMEM 状況メモ (2025-12-05 更新 / C7 Warm/TLS Bind 反映) +### Phase FP1: Mixed 16–1024B madvise A/B(C7-only v3, front v3+LUT+fast classify ON, ws=400, iters=1M, Release) +- Baseline (MIXED_TINYV3_C7_SAFE, SS_OS_STATS=1): **32.76M ops/s**。`[SS_OS_STATS] madvise=4 madvise_enomem=1 madvise_disabled=1`(warmup で ENOMEM→madvise 停止)。perf: task-clock 50.88ms / minor-faults 6,742 / user 35.3ms / sys 16.2ms。 +- Low-madvise(+`HAKMEM_FREE_POLICY=keep HAKMEM_DISABLE_BATCH=1 HAKMEM_SS_MADVISE_STRICT=0`, SS_OS_STATS=1): **32.69M ops/s**。`madvise=3 enomem=0 disabled=0`。perf: task-clock 54.96ms / minor-faults 6,724 / user 35.1ms / sys 20.8ms。 +- Batch+THP 寄り(+`HAKMEM_FREE_POLICY=batch HAKMEM_DISABLE_BATCH=0 HAKMEM_THP=auto`, SS_OS_STATS=1): **33.24M ops/s**。`madvise=3 enomem=0 disabled=0`。perf: task-clock 49.57ms / minor-faults 6,731 / user 35.4ms / sys 15.1ms。 +- 所感: pf/OPS とも大差なし。低 madvise での改善は見られず、Batch+THP 側がわずかに良好(+1〜2%)。vm.max_map_count が厳しい環境で failfast を避けたい場合のみ keep/STRICT=0 に切替える運用が現実的。 + ### Hotfix: madvise(ENOMEM) を握りつぶし、以降の madvise を停止(Superslab OS Box) - 変更: `ss_os_madvise_guarded()` を追加し、madvise が ENOMEM を返したら `g_ss_madvise_disabled=1` にして以降の madvise をスキップ。EINVAL だけは従来どおり STRICT=1 で Fail-Fast(ENV `HAKMEM_SS_MADVISE_STRICT` で緩和可)。 - stats: `[SS_OS_STATS]` に `madvise_enomem/madvise_other/madvise_disabled` を追加。HAKMEM_SS_OS_STATS=1 で確認可能。 - ねらい: vm.max_map_count 到達時の大量 ENOMEM で VMA がさらに分割されるのを防ぎ、アロケータ自体は走り続ける。 +### PhaseS1: SmallObject v3 C6 トライ前のベースライン(C7-only) +- 条件: Release, `./bench_random_mixed_hakmem 1000000 400 1`、ENV `HAKMEM_BENCH_MIN_SIZE=16 HAKMEM_BENCH_MAX_SIZE=1024 HAKMEM_TINY_HEAP_PROFILE=C7_SAFE HAKMEM_TINY_C7_HOT=1 HAKMEM_TINY_HOTHEAP_V2=0 HAKMEM_SMALL_HEAP_V3_ENABLED=1 HAKMEM_SMALL_HEAP_V3_CLASSES=0x80 HAKMEM_POOL_V2_ENABLED=0`(C7 v3 のみ)。 +- 結果: Throughput ≈ **46.31M ops/s**(segv/assert なし、SS/Rel ログのみ)。Phase S1 で C6 v3 を追加する際の比較用ベースとする。 +- C6-only v3(research / bench 専用): `HAKMEM_BENCH_MIN_SIZE=257 MAX_SIZE=768 TINY_HEAP_PROFILE=C7_SAFE TINY_C7_HOT=1 TINY_C6_HOT=1 TINY_HOTHEAP_V2=0 SMALL_HEAP_V3_ENABLED=1 SMALL_HEAP_V3_CLASSES=0x40 POOL_V2_ENABLED=0` → Throughput ≈ **36.77M ops/s**(segv/assert なし)。C6 stats `route_hits=266,930 alloc_refill=5 fb_v1=0 page_of_fail=0`(C7 は v1 ルート)。 +- Mixed 16–1024B C6+C7 v3: `HAKMEM_SMALL_HEAP_V3_CLASSES=0xC0 SMALL_HEAP_V3_STATS=1 TINY_C6_HOT=1` で `./bench_random_mixed_hakmem 1000000 400 1` → Throughput ≈ **44.45M ops/s**、`cls6 route_hits=137,307 alloc_refill=1 fb_v1=0 page_of_fail=0` / `cls7 route_hits=283,170 alloc_refill=2,446 fb_v1=0 page_of_fail=0`。C7 slow/refill は従来レンジ。 +- 追加 A/B(C6-heavy v1 vs v3): 同条件 `MIN=257 MAX=768 ws=400 iters=1M` で `CLASSES=0x80`(C6 v1)→ **47.71M ops/s**(v3 stats は cls7 のみ)、`CLASSES=0x40`(C6 v3)→ **36.77M ops/s**。約 -23% で v3 が劣後。 +- Mixed 16–1024B 追加 A/B: `CLASSES=0x80`(C7-only)→ **47.45M ops/s**、`CLASSES=0xC0`(C6+C7 v3)→ **44.45M ops/s**(約 -6%)。cls6 stats は route_hits=137,307 alloc_refill=1 fb_v1=0 page_of_fail=0。 +- 方針: デフォルトは C7-only(mask 0x80)のまま。C6 v3 は `HAKMEM_SMALL_HEAP_V3_CLASSES` bit6 で明示 opt-in(研究箱)。ベンチ時は `HAKMEM_TINY_C6_HOT=1` を併用して tiny front を確実に通す。C6 v3 は現状 C6-heavy/Mixed とも性能マイナスのため、研究箱据え置き。 +- 確定: 標準プロファイルは `HAKMEM_SMALL_HEAP_V3_CLASSES=0x80`(C7-only v3 固定)。bit6(C6)は研究専用で本線に乗せない。 +- C6-heavy / C6 を v1 固定で走らせる推奨プリセット: + ``` + HAKMEM_BENCH_MIN_SIZE=257 + HAKMEM_BENCH_MAX_SIZE=768 + HAKMEM_TINY_HEAP_PROFILE=C7_SAFE + HAKMEM_TINY_C6_HOT=1 + HAKMEM_SMALL_HEAP_V3_ENABLED=1 + HAKMEM_SMALL_HEAP_V3_CLASSES=0x80 # C7-only v3 + ``` + +### Mixed 16–1024B 新基準(C7-only v3 / front v3 ON, 2025-12-05) +- ENV: `HAKMEM_BENCH_MIN_SIZE=16 MAX_SIZE=1024 TINY_HEAP_PROFILE=C7_SAFE TINY_C7_HOT=1 TINY_HOTHEAP_V2=0 SMALL_HEAP_V3_ENABLED=1 SMALL_HEAP_V3_CLASSES=0x80 POOL_V2_ENABLED=0`(front v3/LUT はデフォルト ON、v3 stats ON)。 +- HAKMEM: **44.45M ops/s**、`cls7 alloc_refill=2446 fb_v1=0 page_of_fail=0`(segv/assert なし)。 +- mimalloc: **117.20M ops/s**。system: **90.95M ops/s**。→ HAKMEM は mimalloc の約 **38%**、system の約 **49%**。 + +### C6-heavy 最新ベースライン(C6 v1 固定 / flatten OFF, 2025-12-05) +- ENV: `HAKMEM_BENCH_MIN_SIZE=257 MAX_SIZE=768 TINY_HEAP_PROFILE=C7_SAFE TINY_C6_HOT=1 SMALL_HEAP_V3_ENABLED=1 SMALL_HEAP_V3_CLASSES=0x80 POOL_V2_ENABLED=0 POOL_V1_FLATTEN_ENABLED=0`。 +- HAKMEM: **29.01M ops/s**(segv/assert なし)。Phase80/82 以降の比較用新基準。 + ### Phase80: mid/smallmid Pool v1 flatten(C6-heavy) - 目的: mid/smallmid の pool v1 ホットパスを薄くし、C6-heavy ベンチで +5〜10% 程度の底上げを狙う。 - 実装: `core/hakmem_pool.c` に v1 専用のフラット化経路(`hak_pool_try_alloc_v1_flat` / `hak_pool_free_v1_flat`)を追加し、TLS ring/lo hit 時は即 return・その他は従来の `_v1_impl` へフォールバックする Box に分離。ENV `HAKMEM_POOL_V1_FLATTEN_ENABLED`(デフォルト0)と `HAKMEM_POOL_V1_FLATTEN_STATS` でオンオフと統計を制御。 @@ -866,3 +900,37 @@ v2 内部のリスト (`current_page` / `partial_pages` / `full_pages`) から unlink したら Hot 側の state を全て破棄する。 3. `TinyColdIface` を **「refill/retire だけの境界」**として明確化し、Hot Box から Cold Box への侵入(meta/used/freelist の直接操作)をこれ以上増やさない。 4. C7-only で v2 ON/OFF を A/B しつつ、`cold_refill_fail` が 0 に張り付いていること、`alloc_fast` ≈ v1 の `fast` 件数に近づいていることを確認する(性能よりもまず安定性・境界の分離を優先)。 + +### Phase ML1: Pool v1 Zero コスト削減(memset 89.73% 軽量化) + +**背景**: C6-heavy(mid/smallmid, Pool v1/flatten 系)ベンチで `__memset_avx2_unaligned_erms` が self **89.73%** を占有(perf 実測)。 + +**実装**: ChatGPT により修正完了 +- `core/box/pool_zero_mode_box.h` 新設(ENV キャッシュ経由で ZERO_MODE を統一管理) +- `core/bench_profile.h`: glibc setenv 呼び出しをセグフォから守るため、RTLD_NEXT 経由の malloc+putenv に切り替え +- `core/hakmem_pool.c`: zero mode に応じた memset 制御(FULL/header/off) + +**A/B テスト結果(C6-heavy, PROFILE=C6_HEAVY_LEGACY_POOLV1, flatten OFF)**: + +| Iterations | ZERO_MODE=full | ZERO_MODE=header | 改善 | +|-----------|----------------|-----------------|------| +| 10K | 3.06 M ops/s | 3.17 M ops/s | **+3.65%** | +| **1M** | **23.71 M ops/s** | **27.34 M ops/s** | **+15.34%** 🚀 | + +**所感**: イテレーション数が増えると改善率も大きくなる(memset overhead の割合が増加)。header mode で期待値 +3-5% を大幅に超える +15% の改善を実現。デフォルトは `ZERO_MODE=full`(安全側)のまま、bench/micro-opt 時のみ `export HAKMEM_POOL_ZERO_MODE=header` で opt-in。 + +**環境変数**: +```bash +# ベースライン(フル zero) +export HAKMEM_PROFILE=C6_HEAVY_LEGACY_POOLV1 +./bench_mid_large_mt_hakmem 1 1000000 400 1 +# → 23.71 M ops/s + +# 軽量 zero(header + guard のみ) +export HAKMEM_PROFILE=C6_HEAVY_LEGACY_POOLV1 +export HAKMEM_POOL_ZERO_MODE=header +./bench_mid_large_mt_hakmem 1 1000000 400 1 +# → 27.34 M ops/s (+15.34%) +``` + +**次のステップ**: Phase 82 の full flatten が C7_SAFE で crash する理由を調査し、+13% の改善を実現することを検討。 diff --git a/Makefile b/Makefile index 6a0991a8..341f1b84 100644 --- a/Makefile +++ b/Makefile @@ -13,7 +13,6 @@ help: @echo "Development (Fast builds):" @echo " make bench_random_mixed_hakmem - Quick build (~1-2 min)" @echo " make bench_tiny_hot_hakmem - Quick build" - @echo " make test_hakmem - Quick test build" @echo "" @echo "Benchmarking (PGO-optimized, +6% faster):" @echo " make pgo-tiny-full - Full PGO workflow (~5-10 min)" @@ -219,12 +218,12 @@ LDFLAGS += $(EXTRA_LDFLAGS) # Targets TARGET = test_hakmem -OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/wrapper_env_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o test_hakmem.o +OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/wrapper_env_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o OBJS = $(OBJS_BASE) # Shared library SHARED_LIB = libhakmem.so -SHARED_OBJS = hakmem_shared.o hakmem_config_shared.o hakmem_tiny_config_shared.o hakmem_ucb1_shared.o hakmem_bigcache_shared.o hakmem_pool_shared.o hakmem_l25_pool_shared.o hakmem_site_rules_shared.o hakmem_tiny_shared.o core/box/ss_allocation_box_shared.o superslab_stats_shared.o superslab_cache_shared.o superslab_ace_shared.o superslab_slab_shared.o superslab_backend_shared.o core/superslab_head_stub_shared.o hakmem_smallmid_shared.o core/box/superslab_expansion_box_shared.o core/box/integrity_box_shared.o core/box/mailbox_box_shared.o core/box/front_gate_box_shared.o core/box/front_gate_classifier_shared.o core/box/free_publish_box_shared.o core/box/capacity_box_shared.o core/box/carve_push_box_shared.o core/box/prewarm_box_shared.o core/box/ss_hot_prewarm_box_shared.o core/box/front_metrics_box_shared.o core/box/bench_fast_box_shared.o core/box/ss_addr_map_box_shared.o core/box/slab_recycling_box_shared.o core/box/pagefault_telemetry_box_shared.o core/box/tiny_sizeclass_hist_box_shared.o core/box/tiny_env_box_shared.o core/box/tiny_route_box_shared.o core/box/tiny_page_box_shared.o core/box/tiny_class_policy_box_shared.o core/box/tiny_class_stats_box_shared.o core/box/tiny_policy_learner_box_shared.o core/box/ss_budget_box_shared.o core/box/tiny_mem_stats_box_shared.o core/box/wrapper_env_box_shared.o core/page_arena_shared.o core/front/tiny_unified_cache_shared.o core/tiny_alloc_fast_push_shared.o core/link_stubs_shared.o core/tiny_failfast_shared.o tiny_sticky_shared.o tiny_remote_shared.o tiny_publish_shared.o tiny_debug_ring_shared.o hakmem_tiny_magazine_shared.o hakmem_tiny_stats_shared.o hakmem_tiny_sfc_shared.o hakmem_tiny_query_shared.o hakmem_tiny_rss_shared.o hakmem_tiny_registry_shared.o hakmem_tiny_remote_target_shared.o hakmem_tiny_bg_spill_shared.o tiny_adaptive_sizing_shared.o hakmem_super_registry_shared.o hakmem_shared_pool_shared.o hakmem_shared_pool_acquire_shared.o hakmem_shared_pool_release_shared.o hakmem_elo_shared.o hakmem_batch_shared.o hakmem_p2_shared.o hakmem_sizeclass_dist_shared.o hakmem_evo_shared.o hakmem_debug_shared.o hakmem_sys_shared.o hakmem_whale_shared.o hakmem_policy_shared.o hakmem_ace_shared.o hakmem_ace_stats_shared.o hakmem_ace_controller_shared.o hakmem_ace_metrics_shared.o hakmem_ace_ucb1_shared.o hakmem_prof_shared.o hakmem_learner_shared.o hakmem_size_hist_shared.o hakmem_learn_log_shared.o hakmem_syscall_shared.o tiny_fastcache_shared.o core/box/super_reg_box_shared.o core/box/shared_pool_box_shared.o core/box/remote_side_box_shared.o +SHARED_OBJS = hakmem_shared.o hakmem_config_shared.o hakmem_tiny_config_shared.o hakmem_ucb1_shared.o hakmem_bigcache_shared.o hakmem_pool_shared.o hakmem_l25_pool_shared.o hakmem_site_rules_shared.o hakmem_tiny_shared.o core/box/ss_allocation_box_shared.o superslab_stats_shared.o superslab_cache_shared.o superslab_ace_shared.o superslab_slab_shared.o superslab_backend_shared.o core/superslab_head_stub_shared.o hakmem_smallmid_shared.o core/box/superslab_expansion_box_shared.o core/box/integrity_box_shared.o core/box/mailbox_box_shared.o core/box/front_gate_box_shared.o core/box/front_gate_classifier_shared.o core/box/free_publish_box_shared.o core/box/capacity_box_shared.o core/box/carve_push_box_shared.o core/box/prewarm_box_shared.o core/box/ss_hot_prewarm_box_shared.o core/box/front_metrics_box_shared.o core/box/bench_fast_box_shared.o core/box/ss_addr_map_box_shared.o core/box/slab_recycling_box_shared.o core/box/pagefault_telemetry_box_shared.o core/box/tiny_sizeclass_hist_box_shared.o core/box/tiny_env_box_shared.o core/box/tiny_route_box_shared.o core/box/tiny_page_box_shared.o core/box/tiny_class_policy_box_shared.o core/box/tiny_class_stats_box_shared.o core/box/tiny_policy_learner_box_shared.o core/box/ss_budget_box_shared.o core/box/tiny_mem_stats_box_shared.o core/box/wrapper_env_box_shared.o core/box/madvise_guard_box_shared.o core/box/libm_reloc_guard_box_shared.o core/page_arena_shared.o core/front/tiny_unified_cache_shared.o core/tiny_alloc_fast_push_shared.o core/link_stubs_shared.o core/tiny_failfast_shared.o tiny_sticky_shared.o tiny_remote_shared.o tiny_publish_shared.o tiny_debug_ring_shared.o hakmem_tiny_magazine_shared.o hakmem_tiny_stats_shared.o hakmem_tiny_sfc_shared.o hakmem_tiny_query_shared.o hakmem_tiny_rss_shared.o hakmem_tiny_registry_shared.o hakmem_tiny_remote_target_shared.o hakmem_tiny_bg_spill_shared.o tiny_adaptive_sizing_shared.o hakmem_super_registry_shared.o hakmem_shared_pool_shared.o hakmem_shared_pool_acquire_shared.o hakmem_shared_pool_release_shared.o hakmem_elo_shared.o hakmem_batch_shared.o hakmem_p2_shared.o hakmem_sizeclass_dist_shared.o hakmem_evo_shared.o hakmem_debug_shared.o hakmem_sys_shared.o hakmem_whale_shared.o hakmem_policy_shared.o hakmem_ace_shared.o hakmem_ace_stats_shared.o hakmem_ace_controller_shared.o hakmem_ace_metrics_shared.o hakmem_ace_ucb1_shared.o hakmem_prof_shared.o hakmem_learner_shared.o hakmem_size_hist_shared.o hakmem_learn_log_shared.o hakmem_syscall_shared.o tiny_fastcache_shared.o core/box/super_reg_box_shared.o core/box/shared_pool_box_shared.o core/box/remote_side_box_shared.o core/tiny_destructors_shared.o # Pool TLS Phase 1 (enable with POOL_TLS_PHASE1=1) ifeq ($(POOL_TLS_PHASE1),1) @@ -251,7 +250,7 @@ endif # Benchmark targets BENCH_HAKMEM = bench_allocators_hakmem BENCH_SYSTEM = bench_allocators_system -BENCH_HAKMEM_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/wrapper_env_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o bench_allocators_hakmem.o +BENCH_HAKMEM_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/wrapper_env_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o bench_allocators_hakmem.o BENCH_HAKMEM_OBJS = $(BENCH_HAKMEM_OBJS_BASE) ifeq ($(POOL_TLS_PHASE1),1) BENCH_HAKMEM_OBJS += pool_tls.o pool_refill.o pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o @@ -428,7 +427,7 @@ test-box-refactor: box-refactor ./larson_hakmem 10 8 128 1024 1 12345 4 # Phase 4: Tiny Pool benchmarks (properly linked with hakmem) -TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/wrapper_env_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o core/smallobject_hotbox_v3.o +TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/wrapper_env_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o TINY_BENCH_OBJS = $(TINY_BENCH_OBJS_BASE) ifeq ($(POOL_TLS_PHASE1),1) TINY_BENCH_OBJS += pool_tls.o pool_refill.o core/pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o @@ -1234,7 +1233,7 @@ valgrind-hakmem-hot64-lite: .PHONY: unit unit-run UNIT_BIN_DIR := tests/bin -UNIT_BINS := $(UNIT_BIN_DIR)/test_super_registry $(UNIT_BIN_DIR)/test_ready_ring $(UNIT_BIN_DIR)/test_mailbox_box +UNIT_BINS := $(UNIT_BIN_DIR)/test_super_registry $(UNIT_BIN_DIR)/test_ready_ring $(UNIT_BIN_DIR)/test_mailbox_box $(UNIT_BIN_DIR)/madvise_guard_test $(UNIT_BIN_DIR)/libm_reloc_guard_test unit: $(UNIT_BINS) @echo "OK: unit tests built -> $(UNIT_BINS)" @@ -1251,10 +1250,20 @@ $(UNIT_BIN_DIR)/test_mailbox_box: tests/unit/test_mailbox_box.c tests/unit/mailb @mkdir -p $(UNIT_BIN_DIR) $(CC) $(CFLAGS) -o $@ $^ $(LDFLAGS) +$(UNIT_BIN_DIR)/madvise_guard_test: tests/unit/madvise_guard_test.c core/box/madvise_guard_box.c + @mkdir -p $(UNIT_BIN_DIR) + $(CC) $(CFLAGS) -o $@ $^ $(LDFLAGS) + +$(UNIT_BIN_DIR)/libm_reloc_guard_test: tests/unit/libm_reloc_guard_test.c core/box/libm_reloc_guard_box.c + @mkdir -p $(UNIT_BIN_DIR) + $(CC) $(CFLAGS) -o $@ $^ $(LDFLAGS) + unit-run: unit @echo "Running unit: test_super_registry" && $(UNIT_BIN_DIR)/test_super_registry @echo "Running unit: test_ready_ring" && $(UNIT_BIN_DIR)/test_ready_ring @echo "Running unit: test_mailbox_box" && $(UNIT_BIN_DIR)/test_mailbox_box + @echo "Running unit: madvise_guard_test" && $(UNIT_BIN_DIR)/madvise_guard_test + @echo "Running unit: libm_reloc_guard_test" && $(UNIT_BIN_DIR)/libm_reloc_guard_test # Build 3-layer Tiny (new front) with low optimization for debug/testing larson_hakmem_3layer: diff --git a/PERF_ANALYSIS_16_1024B_20251205.md b/PERF_ANALYSIS_16_1024B_20251205.md index 2eff3ae6..95c24060 100644 --- a/PERF_ANALYSIS_16_1024B_20251205.md +++ b/PERF_ANALYSIS_16_1024B_20251205.md @@ -4,6 +4,18 @@ **ベンチマーク**: `bench_random_mixed` (1M iterations, ws=400, seed=1) **サイズ範囲**: 16-1024 bytes (Tiny allocator: 8 size classes) +## Quick Baseline Refresh (2025-12-05, C7-only v3 / front v3 ON) + +**ENV (Release)**: `HAKMEM_BENCH_MIN_SIZE=16 MAX_SIZE=1024 TINY_HEAP_PROFILE=C7_SAFE TINY_C7_HOT=1 TINY_HOTHEAP_V2=0 SMALL_HEAP_V3_ENABLED=1 SMALL_HEAP_V3_CLASSES=0x80 POOL_V2_ENABLED=0`(front v3/LUT デフォルト ON, SMALL_HEAP_V3_STATS=1)。 + +| Allocator | Throughput (ops/s) | Ratio vs mimalloc | +|-----------|--------------------|-------------------| +| HAKMEM (C7-only v3) | **44,447,714** | 38.0% | +| mimalloc | 117,204,756 | 100% | +| glibc malloc | 90,952,144 | 77.6% | + +SmallObject v3 stats (cls7): `route_hits=283,170 alloc_refill=2,446 alloc_fb_v1=0 free_fb_v1=0 page_of_fail=0`。segv/assert なし。 + --- ## エグゼクティブサマリー diff --git a/PERF_BOTTLENECK_ANALYSIS_20251204.md b/PERF_BOTTLENECK_ANALYSIS_20251204.md index 82a4c6a8..fc38145f 100644 --- a/PERF_BOTTLENECK_ANALYSIS_20251204.md +++ b/PERF_BOTTLENECK_ANALYSIS_20251204.md @@ -4,6 +4,7 @@ Date: 2025-12-04 Current Performance: 4.1M ops/s Target Performance: 16M+ ops/s (4x improvement) Performance Gap: 3.9x remaining +mid/smallmid(C6-heavy)ベンチを再現するときは、`docs/analysis/ENV_PROFILE_PRESETS.md` の `C6_HEAVY_LEGACY_POOLV1` プリセットをスタートポイントにしてください。 ## KEY METRICS SUMMARY diff --git a/PHASE_ML1_CHATGPT_GUIDE.md b/PHASE_ML1_CHATGPT_GUIDE.md new file mode 100644 index 00000000..53bd382e --- /dev/null +++ b/PHASE_ML1_CHATGPT_GUIDE.md @@ -0,0 +1,62 @@ +# PHASE ML1: ChatGPT 依頼用ガイド(Pool v1 memset 89.73% 課題) + +## 1. 背景情報 +- mid/smallmid (C6-heavy, Pool v1/flatten 系) のベンチで `__memset_avx2_unaligned_erms` が self 89.73% を占有(perf 実測)。 +- 目的: Pool v1 の zero コストを減らす(デフォルト安全は維持しつつ、ベンチ専用の opt-in を用意)。 +- 現状: zero mode を pool_api.inc.h に直接足したところ、ベンチ起動直後にセグフォが発生。 + +## 2. 問題の詳細 +- セグフォの推測要因 + - pool_api.inc.h が複数翻訳単位から include され、`static` キャッシュ変数が TU ごとにばらける。 + - ENV 読み取りをヘッダ内で直接行ったため、初期化順や再定義が崩れている可能性。 + - ZERO_MODE=header 実装が TLS/flatten 経路と食い違っているかもしれない。 +- 現在のコード(問題箇所のイメージ) + - `HAKMEM_POOL_ZERO_MODE` をヘッダ内で `static int g=-1; getenv(...);` する小さな関数を追加しただけで segfault。 + +## 3. 修正案(2択) +- 選択肢 A: Environment Cache を使う(推奨) + - `core/hakmem_env_cache.h` など既存の ENV キャッシュ箱に「pool_zero_mode」を追加し、ヘッダ側は薄い getter だけにする。 + - 1 箇所で getenv/パース → 全翻訳単位で一貫させる(箱理論: 変換点を 1 箇所に)。 +- 選択肢 B: 制約を緩和(暫定) + - ヘッダで ENV を読まない。zero/partial memset を呼ぶかどうかを、C 側の単一関数で判定して呼び出すだけに戻す。 + - まずセグフォを解消し、memset の最適化は後続フェーズに送る。 + +## 4. 詳細な調査手順 +- memset 呼び出し元の再確認 + ```bash + rg \"memset\" core/hakmem_pool.c core/box/pool_api.inc.h + ``` +- perf の再取得(C6-heavy LEGACY/flatten なし) + ```bash + export HAKMEM_PROFILE=C6_HEAVY_LEGACY_POOLV1 + perf record -F 5000 --call-graph dwarf -e cycles:u -o perf.data.ml1 \ + ./bench_mid_large_mt_hakmem 1 1000000 400 1 + perf report -i perf.data.ml1 --stdio | rg memset + perf annotate -i perf.data.ml1 __memset_avx2_unaligned_erms | head -40 + ``` +- 呼び出し階層を掘る(TLS alloc か slow path かを確認) + ```bash + perf script -i perf.data.ml1 --call-trace | rg -C2 'memset' + ``` + +## 5. 実装の方向性の再検討 +- TLS alloc path で memset が本当に呼ばれているかを必ず確認(`hak_pool_try_alloc_v1_flat` 周辺)。 +- memset が page 初期化のみなら、ZERO_MODE は TLS ring には効かない可能性 → 方針を「page 初期化の頻度を減らす」に切替も検討。 +- ZERO_MODE を入れる場合も: + - ENV キャッシュを 1 箇所に集約。 + - デフォルトは FULL zero、header/off は bench opt-in。 + - Fail-Fast: 異常 ENV はログして既定値にフォールバック。 + +## 6. テストコマンド(A/B) +```bash +# ベースライン(FULL zero) +export HAKMEM_PROFILE=C6_HEAVY_LEGACY_POOLV1 +timeout 120 ./bench_mid_large_mt_hakmem 1 1000000 400 1 + +# header mode(memset を軽量化する実装を入れたら) +export HAKMEM_PROFILE=C6_HEAVY_LEGACY_POOLV1 +export HAKMEM_POOL_ZERO_MODE=header +timeout 120 ./bench_mid_large_mt_hakmem 1 1000000 400 1 +``` +- 比較: ops/s, SS/POOL stats(あれば memset 呼び出し数 proxy)、セグフォ/アサートがないこと。 +- header mode で +3〜5% 程度伸びれば成功。負になれば撤回 or slow-path のみに適用。 diff --git a/README_PERF_ANALYSIS.md b/README_PERF_ANALYSIS.md index 1cc7029b..aef9683f 100644 --- a/README_PERF_ANALYSIS.md +++ b/README_PERF_ANALYSIS.md @@ -1,5 +1,7 @@ # HAKMEM Allocator Performance Analysis Results +標準 Mixed 16–1024B ベンチの ENV は `docs/analysis/ENV_PROFILE_PRESETS.md` の `MIXED_TINYV3_C7_SAFE` プリセットを参照してください。ベンチ実行前に `HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE` を export すると自動で適用されます(既存 ENV があればそちらを優先)。 + **最新メモ (2025-12-06, Release)** - 新規比較表: `PERF_COMPARISON_ALLOCATORS.md` に HAKMEM (full/larson_guard) / mimalloc / system の ops/s と RSS を掲載。C7-only/129–1024/full いずれも HAKMEM は ~50M ops/s / ~29MB RSS、system/mimalloc は 75–126M ops/s / 1.6–1.9MB RSS で優位。 - Random Mixed 129–1024B, ws=256, iters=1M, `HAKMEM_WARM_TLS_BIND_C7=2`: diff --git a/bench_random_mixed.c b/bench_random_mixed.c index 16c81c0b..1e4fce81 100644 --- a/bench_random_mixed.c +++ b/bench_random_mixed.c @@ -16,6 +16,7 @@ #include #include #include +#include "core/bench_profile.h" #ifdef USE_HAKMEM #include "hakmem.h" @@ -80,6 +81,8 @@ static inline int bench_is_c6_only_mode(void) { } int main(int argc, char** argv){ + bench_apply_profile(); + int cycles = (argc>1)? atoi(argv[1]) : 10000000; // total ops (10M for steady-state measurement) int ws = (argc>2)? atoi(argv[2]) : 8192; // working-set slots uint32_t seed = (argc>3)? (uint32_t)strtoul(argv[3],NULL,10) : 1234567u; diff --git a/core/box/capacity_box.c b/core/box/capacity_box.c index a93787a5..057dd3f0 100644 --- a/core/box/capacity_box.c +++ b/core/box/capacity_box.c @@ -18,7 +18,6 @@ static _Atomic int g_box_cap_initialized = 0; // External declarations (from adaptive_sizing and hakmem_tiny) extern __thread TLSCacheStats g_tls_cache_stats[TINY_NUM_CLASSES]; // TLS variable! extern __thread TinyTLSSLL g_tls_sll[TINY_NUM_CLASSES]; -extern int g_sll_cap_override[TINY_NUM_CLASSES]; // LEGACY (Phase12以降は参照しない/互換用ダミー) extern int g_sll_multiplier; // ============================================================================ @@ -50,9 +49,7 @@ uint32_t box_cap_get(int class_idx) { } // Compute SLL capacity using same logic as sll_cap_for_class() - // This centralizes the capacity calculation - - // Phase12: g_sll_cap_override はレガシー互換ダミー。capacity_box では無視する。 + // This centralizes the capacity calculation(旧 g_sll_cap_override は削除済み)。 // Get base capacity from adaptive sizing uint32_t cap = g_tls_cache_stats[class_idx].capacity; diff --git a/core/box/carve_push_box.c b/core/box/carve_push_box.c index ac47c881..0407e270 100644 --- a/core/box/carve_push_box.c +++ b/core/box/carve_push_box.c @@ -20,13 +20,14 @@ // External declarations extern __thread TinyTLSSlab g_tls_slabs[TINY_NUM_CLASSES]; extern __thread TinyTLSSLL g_tls_sll[TINY_NUM_CLASSES]; +extern void ss_active_add(SuperSlab* ss, uint32_t n); // ============================================================================ // Internal Helpers // ============================================================================ // Rollback: return carved blocks to freelist -static void rollback_carved_blocks(int class_idx, TinySlabMeta* meta, +static __attribute__((unused)) void rollback_carved_blocks(int class_idx, TinySlabMeta* meta, void* head, uint32_t count) { // Walk the chain and prepend to freelist void* node = head; diff --git a/core/box/carve_push_box.d b/core/box/carve_push_box.d index 82ae970e..3eff0adb 100644 --- a/core/box/carve_push_box.d +++ b/core/box/carve_push_box.d @@ -10,16 +10,18 @@ core/box/carve_push_box.o: core/box/carve_push_box.c \ core/box/../superslab/../tiny_box_geometry.h \ core/box/../superslab/../hakmem_tiny_superslab_constants.h \ core/box/../superslab/../hakmem_tiny_config.h \ + core/box/../superslab/../hakmem_super_registry.h \ + core/box/../superslab/../hakmem_tiny_superslab.h \ + core/box/../superslab/../box/ss_addr_map_box.h \ + core/box/../superslab/../box/../hakmem_build_flags.h \ + core/box/../superslab/../box/super_reg_box.h \ core/box/../tiny_debug_ring.h core/box/../tiny_remote.h \ core/box/../hakmem_tiny_superslab_constants.h \ core/box/../hakmem_tiny_config.h core/box/../hakmem_tiny_superslab.h \ core/box/../hakmem_tiny_integrity.h core/box/../hakmem_tiny.h \ core/box/../tiny_region_id.h core/box/../tiny_box_geometry.h \ - core/box/../ptr_track.h core/box/../hakmem_super_registry.h \ - core/box/../box/ss_addr_map_box.h \ - core/box/../box/../hakmem_build_flags.h core/box/../box/super_reg_box.h \ - core/box/../tiny_debug_api.h core/box/carve_push_box.h \ - core/box/capacity_box.h core/box/tls_sll_box.h \ + core/box/../ptr_track.h core/box/../tiny_debug_api.h \ + core/box/carve_push_box.h core/box/capacity_box.h core/box/tls_sll_box.h \ core/box/../hakmem_internal.h core/box/../hakmem.h \ core/box/../hakmem_config.h core/box/../hakmem_features.h \ core/box/../hakmem_sys.h core/box/../hakmem_whale.h \ @@ -59,6 +61,11 @@ core/box/../superslab/superslab_types.h: core/box/../superslab/../tiny_box_geometry.h: core/box/../superslab/../hakmem_tiny_superslab_constants.h: core/box/../superslab/../hakmem_tiny_config.h: +core/box/../superslab/../hakmem_super_registry.h: +core/box/../superslab/../hakmem_tiny_superslab.h: +core/box/../superslab/../box/ss_addr_map_box.h: +core/box/../superslab/../box/../hakmem_build_flags.h: +core/box/../superslab/../box/super_reg_box.h: core/box/../tiny_debug_ring.h: core/box/../tiny_remote.h: core/box/../hakmem_tiny_superslab_constants.h: @@ -69,10 +76,6 @@ core/box/../hakmem_tiny.h: core/box/../tiny_region_id.h: core/box/../tiny_box_geometry.h: core/box/../ptr_track.h: -core/box/../hakmem_super_registry.h: -core/box/../box/ss_addr_map_box.h: -core/box/../box/../hakmem_build_flags.h: -core/box/../box/super_reg_box.h: core/box/../tiny_debug_api.h: core/box/carve_push_box.h: core/box/capacity_box.h: diff --git a/core/box/free_publish_box.d b/core/box/free_publish_box.d index 8cf05ceb..aff407d9 100644 --- a/core/box/free_publish_box.d +++ b/core/box/free_publish_box.d @@ -4,7 +4,12 @@ core/box/free_publish_box.o: core/box/free_publish_box.c \ core/superslab/superslab_inline.h core/superslab/superslab_types.h \ core/superslab/../tiny_box_geometry.h \ core/superslab/../hakmem_tiny_superslab_constants.h \ - core/superslab/../hakmem_tiny_config.h core/tiny_debug_ring.h \ + core/superslab/../hakmem_tiny_config.h \ + core/superslab/../hakmem_super_registry.h \ + core/superslab/../hakmem_tiny_superslab.h \ + core/superslab/../box/ss_addr_map_box.h \ + core/superslab/../box/../hakmem_build_flags.h \ + core/superslab/../box/super_reg_box.h core/tiny_debug_ring.h \ core/hakmem_build_flags.h core/tiny_remote.h \ core/hakmem_tiny_superslab_constants.h core/hakmem_tiny.h \ core/hakmem_trace.h core/hakmem_tiny_mini_mag.h \ @@ -20,6 +25,11 @@ core/superslab/superslab_types.h: core/superslab/../tiny_box_geometry.h: core/superslab/../hakmem_tiny_superslab_constants.h: core/superslab/../hakmem_tiny_config.h: +core/superslab/../hakmem_super_registry.h: +core/superslab/../hakmem_tiny_superslab.h: +core/superslab/../box/ss_addr_map_box.h: +core/superslab/../box/../hakmem_build_flags.h: +core/superslab/../box/super_reg_box.h: core/tiny_debug_ring.h: core/hakmem_build_flags.h: core/tiny_remote.h: diff --git a/core/box/hak_alloc_api.inc.h b/core/box/hak_alloc_api.inc.h index 9fd3e83d..538e4e42 100644 --- a/core/box/hak_alloc_api.inc.h +++ b/core/box/hak_alloc_api.inc.h @@ -233,6 +233,7 @@ inline void* hak_alloc_at(size_t size, hak_callsite_t site) { atomic_fetch_add(&g_final_fallback_mmap_count, 1); static _Atomic int gap_alloc_count = 0; int count = atomic_fetch_add(&gap_alloc_count, 1); + (void)count; #if !HAKMEM_BUILD_RELEASE if (count < 5) { fprintf(stderr, "[HAKMEM] Phase 2 WARN: Pool/ACE fallback size=%zu (should be rare)\n", size); diff --git a/core/box/hak_core_init.inc.h b/core/box/hak_core_init.inc.h index f015b5fb..66390fff 100644 --- a/core/box/hak_core_init.inc.h +++ b/core/box/hak_core_init.inc.h @@ -2,17 +2,19 @@ #ifndef HAK_CORE_INIT_INC_H #define HAK_CORE_INIT_INC_H -#include -#ifdef __GLIBC__ -#include -#endif #include "hakmem_phase7_config.h" // Phase 7 Task 3 +#include "box/libm_reloc_guard_box.h" +#include "box/init_bench_preset_box.h" +#include "box/init_diag_box.h" +#include "box/init_env_box.h" +#include "../tiny_destructors.h" // Debug-only SIGSEGV handler (gated by HAKMEM_DEBUG_SEGV) static void hakmem_sigsegv_handler(int sig) { (void)sig; const char* msg = "\n[HAKMEM] Segmentation Fault\n"; - (void)write(2, msg, 29); + ssize_t written = write(2, msg, 29); + (void)written; #if !HAKMEM_BUILD_RELEASE // Dump Class 1 (16B) last push info for debugging @@ -37,6 +39,7 @@ void hak_init(void) { } static void hak_init_impl(void) { + libm_reloc_guard_run(); HAK_TRACE("[init_impl_enter]\n"); g_init_thread = pthread_self(); atomic_store_explicit(&g_initializing, 1, memory_order_release); @@ -62,16 +65,7 @@ static void hak_init_impl(void) { } HAK_TRACE("[init_impl_after_jemalloc_probe]\n"); - // Optional: one-shot SIGSEGV backtrace for early crash diagnosis - do { - const char* dbg = getenv("HAKMEM_DEBUG_SEGV"); - if (dbg && atoi(dbg) != 0) { - struct sigaction sa; memset(&sa, 0, sizeof(sa)); - sa.sa_flags = SA_RESETHAND; - sa.sa_handler = hakmem_sigsegv_handler; - sigaction(SIGSEGV, &sa, NULL); - } - } while (0); + box_diag_install_sigsegv_handler(hakmem_sigsegv_handler); // NEW Phase 6.11.1: Initialize debug timing hkm_timing_init(); @@ -87,145 +81,15 @@ static void hak_init_impl(void) { // Phase 6.16: Initialize FrozenPolicy (SACS-3) hkm_policy_init(); - // Phase 6.15 P0.3: Configure EVO sampling from environment variable - // HAKMEM_EVO_SAMPLE: 0=disabled (default), N=sample every 2^N calls - // Example: HAKMEM_EVO_SAMPLE=10 → sample every 1024 calls - // HAKMEM_EVO_SAMPLE=16 → sample every 65536 calls - char* evo_sample_str = getenv("HAKMEM_EVO_SAMPLE"); - if (evo_sample_str && atoi(evo_sample_str) > 0) { - int freq = atoi(evo_sample_str); - if (freq >= 64) { - HAKMEM_LOG("Warning: HAKMEM_EVO_SAMPLE=%d too large, using 63\n", freq); - freq = 63; - } - g_evo_sample_mask = (1ULL << freq) - 1; - HAKMEM_LOG("EVO sampling enabled: every 2^%d = %llu calls\n", - freq, (unsigned long long)(g_evo_sample_mask + 1)); - } else { - g_evo_sample_mask = 0; // Disabled by default - HAKMEM_LOG("EVO sampling disabled (HAKMEM_EVO_SAMPLE not set or 0)\n"); - } - -#ifdef __linux__ - // Record baseline KPIs - memset(g_latency_histogram, 0, sizeof(g_latency_histogram)); - g_latency_samples = 0; - - get_page_faults(&g_baseline_soft_pf, &g_baseline_hard_pf); - g_baseline_rss_kb = get_rss_kb(); - - HAKMEM_LOG("Baseline: soft_pf=%lu, hard_pf=%lu, rss=%lu KB\n", - (unsigned long)g_baseline_soft_pf, - (unsigned long)g_baseline_hard_pf, - (unsigned long)g_baseline_rss_kb); -#endif + box_init_env_flags(); + box_diag_record_baseline(); HAKMEM_LOG("Initialized (PoC version)\n"); HAKMEM_LOG("Sampling rate: 1/%d\n", SAMPLING_RATE); HAKMEM_LOG("Max sites: %d\n", MAX_SITES); - // Build banner (one-shot) - do { - const char* bf = "UNKNOWN"; -#ifdef HAKMEM_BUILD_RELEASE - bf = "RELEASE"; -#elif defined(HAKMEM_BUILD_DEBUG) - bf = "DEBUG"; -#endif - HAKMEM_LOG("[Build] Flavor=%s Flags: HEADER_CLASSIDX=%d, AGGRESSIVE_INLINE=%d, POOL_TLS_PHASE1=%d, POOL_TLS_PREWARM=%d\n", - bf, -#if HAKMEM_TINY_HEADER_CLASSIDX - 1, -#else - 0, -#endif -#ifdef HAKMEM_TINY_AGGRESSIVE_INLINE - 1, -#else - 0, -#endif -#ifdef HAKMEM_POOL_TLS_PHASE1 - 1, -#else - 0, -#endif -#ifdef HAKMEM_POOL_TLS_PREWARM - 1 -#else - 0 -#endif - ); - } while (0); - - // Bench preset: Tiny-only (disable non-essential subsystems) - { - char* bt = getenv("HAKMEM_BENCH_TINY_ONLY"); - if (bt && atoi(bt) != 0) { - g_bench_tiny_only = 1; - } - } - - // Under LD_PRELOAD, enforce safer defaults for Tiny path unless overridden - { - char* ldpre = getenv("LD_PRELOAD"); - if (ldpre && strstr(ldpre, "libhakmem.so")) { - g_ldpreload_mode = 1; - // Default LD-safe mode if not set: 1 (Tiny-only) - char* lds = getenv("HAKMEM_LD_SAFE"); - if (lds) { /* NOP used in wrappers */ } else { setenv("HAKMEM_LD_SAFE", "1", 0); } - if (!getenv("HAKMEM_TINY_TLS_SLL")) { - setenv("HAKMEM_TINY_TLS_SLL", "0", 0); // disable TLS SLL by default - } - if (!getenv("HAKMEM_TINY_USE_SUPERSLAB")) { - setenv("HAKMEM_TINY_USE_SUPERSLAB", "0", 0); // disable SuperSlab path by default - } - } - } - - // Runtime safety toggle - char* safe_free_env = getenv("HAKMEM_SAFE_FREE"); - if (safe_free_env && atoi(safe_free_env) != 0) { - g_strict_free = 1; - HAKMEM_LOG("Strict free safety enabled (HAKMEM_SAFE_FREE=1)\n"); - } else { - // Heuristic: if loaded via LD_PRELOAD, enable strict free by default - char* ldpre = getenv("LD_PRELOAD"); - if (ldpre && strstr(ldpre, "libhakmem.so")) { - g_ldpreload_mode = 1; - g_strict_free = 1; - HAKMEM_LOG("Strict free safety auto-enabled under LD_PRELOAD\n"); - } - } - - // Invalid free logging toggle (default off to avoid spam under LD_PRELOAD) - char* invlog = getenv("HAKMEM_INVALID_FREE_LOG"); - if (invlog && atoi(invlog) != 0) { - g_invalid_free_log = 1; - HAKMEM_LOG("Invalid free logging enabled (HAKMEM_INVALID_FREE_LOG=1)\n"); - } - - // Phase 7.4: Cache HAKMEM_INVALID_FREE to eliminate 44% CPU overhead - // Perf showed getenv() on hot path consumed 43.96% CPU time (26.41% strcmp + 17.55% getenv) - char* inv = getenv("HAKMEM_INVALID_FREE"); - if (inv && strcmp(inv, "skip") == 0) { - g_invalid_free_mode = 1; // explicit opt-in to legacy skip mode - HAKMEM_LOG("Invalid free mode: skip check (HAKMEM_INVALID_FREE=skip)\n"); - } else if (inv && strcmp(inv, "fallback") == 0) { - g_invalid_free_mode = 0; // fallback mode: route invalid frees to libc - HAKMEM_LOG("Invalid free mode: fallback to libc (HAKMEM_INVALID_FREE=fallback)\n"); - } else { - // Under LD_PRELOAD, prefer safety: default to fallback unless explicitly overridden - char* ldpre = getenv("LD_PRELOAD"); - if (ldpre && strstr(ldpre, "libhakmem.so")) { - g_ldpreload_mode = 1; - g_invalid_free_mode = 0; - HAKMEM_LOG("Invalid free mode: fallback to libc (auto under LD_PRELOAD)\n"); - } else { - // Default: safety first (fallback), avoids routing unknown pointers into Tiny - g_invalid_free_mode = 0; - HAKMEM_LOG("Invalid free mode: fallback to libc (default)\n"); - } - } + box_diag_print_banner(); + box_init_bench_presets(); // NEW Phase 6.8: Feature-gated initialization (check g_hakem_config flags) if (HAK_ENABLED_ALLOC(HAKMEM_FEATURE_POOL)) { @@ -281,22 +145,8 @@ static void hak_init_impl(void) { // OLD: hak_tiny_init(); (eager init of all 8 classes → 94.94% page faults) // NEW: Lazy init triggered by tiny_alloc_fast() → only used classes initialized - // Env: optional Tiny flush on exit (memory efficiency evaluation) - { - char* tf = getenv("HAKMEM_TINY_FLUSH_ON_EXIT"); - if (tf && atoi(tf) != 0) { - g_flush_tiny_on_exit = 1; - } - char* ud = getenv("HAKMEM_TINY_ULTRA_DEBUG"); - if (ud && atoi(ud) != 0) { - g_ultra_debug_on_exit = 1; - } - // Register exit hook if any of the debug/flush toggles are on - // or when path debug is requested. - if (g_flush_tiny_on_exit || g_ultra_debug_on_exit || getenv("HAKMEM_TINY_PATH_DEBUG")) { - atexit(hak_flush_tiny_exit); - } - } + tiny_destructors_configure_from_env(); + tiny_destructors_register_exit(); // NEW Phase ACE: Initialize Adaptive Control Engine hkm_ace_controller_init(&g_ace_controller); @@ -310,6 +160,7 @@ static void hak_init_impl(void) { #if HAKMEM_TINY_PREWARM_TLS #include "box/ss_hot_prewarm_box.h" int total_prewarmed = box_ss_hot_prewarm_all(); + (void)total_prewarmed; HAKMEM_LOG("TLS cache pre-warmed: %d blocks total (Phase 20-1)\n", total_prewarmed); // After TLS prewarm, cascade some hot blocks into SFC to raise early hit rate { diff --git a/core/box/hak_exit_debug.inc.h b/core/box/hak_exit_debug.inc.h deleted file mode 100644 index 260780cc..00000000 --- a/core/box/hak_exit_debug.inc.h +++ /dev/null @@ -1,50 +0,0 @@ -// hak_exit_debug.inc.h — Exit-time Tiny/SS debug dump (one-shot) -#ifndef HAK_EXIT_DEBUG_INC_H -#define HAK_EXIT_DEBUG_INC_H - -static void hak_flush_tiny_exit(void) { - if (g_flush_tiny_on_exit) { - hak_tiny_magazine_flush_all(); - hak_tiny_trim(); - } - if (g_ultra_debug_on_exit) { - hak_tiny_ultra_debug_dump(); - } - // Path debug dump (optional): HAKMEM_TINY_PATH_DEBUG=1 - hak_tiny_path_debug_dump(); - // Extended counters (optional): HAKMEM_TINY_COUNTERS_DUMP=1 - extern void hak_tiny_debug_counters_dump(void); - hak_tiny_debug_counters_dump(); - - // DEBUG: Print SuperSlab accounting stats - extern _Atomic uint64_t g_ss_active_dec_calls; - extern _Atomic uint64_t g_hak_tiny_free_calls; - extern _Atomic uint64_t g_ss_remote_push_calls; - extern _Atomic uint64_t g_free_ss_enter; - extern _Atomic uint64_t g_free_local_box_calls; - extern _Atomic uint64_t g_free_remote_box_calls; - extern uint64_t g_superslabs_allocated; - extern uint64_t g_superslabs_freed; - - fprintf(stderr, "\n[EXIT DEBUG] SuperSlab Accounting:\n"); - fprintf(stderr, " g_superslabs_allocated = %llu\n", (unsigned long long)g_superslabs_allocated); - fprintf(stderr, " g_superslabs_freed = %llu\n", (unsigned long long)g_superslabs_freed); - fprintf(stderr, " g_hak_tiny_free_calls = %llu\n", - (unsigned long long)atomic_load_explicit(&g_hak_tiny_free_calls, memory_order_relaxed)); - fprintf(stderr, " g_ss_remote_push_calls = %llu\n", - (unsigned long long)atomic_load_explicit(&g_ss_remote_push_calls, memory_order_relaxed)); - fprintf(stderr, " g_ss_active_dec_calls = %llu\n", - (unsigned long long)atomic_load_explicit(&g_ss_active_dec_calls, memory_order_relaxed)); - extern _Atomic uint64_t g_free_wrapper_calls; - fprintf(stderr, " g_free_wrapper_calls = %llu\n", - (unsigned long long)atomic_load_explicit(&g_free_wrapper_calls, memory_order_relaxed)); - fprintf(stderr, " g_free_ss_enter = %llu\n", - (unsigned long long)atomic_load_explicit(&g_free_ss_enter, memory_order_relaxed)); - fprintf(stderr, " g_free_local_box_calls = %llu\n", - (unsigned long long)atomic_load_explicit(&g_free_local_box_calls, memory_order_relaxed)); - fprintf(stderr, " g_free_remote_box_calls = %llu\n", - (unsigned long long)atomic_load_explicit(&g_free_remote_box_calls, memory_order_relaxed)); -} - -#endif // HAK_EXIT_DEBUG_INC_H - diff --git a/core/box/hak_free_api.inc.h b/core/box/hak_free_api.inc.h index a1e6a50f..ea2d76c1 100644 --- a/core/box/hak_free_api.inc.h +++ b/core/box/hak_free_api.inc.h @@ -167,6 +167,7 @@ void hak_free_at(void* ptr, size_t size, hak_callsite_t site) { } #endif + case FG_DOMAIN_POOL: case FG_DOMAIN_MIDCAND: case FG_DOMAIN_EXTERNAL: // Fall through to registry lookup + AllocHeader dispatch diff --git a/core/box/hak_kpi_util.inc.h b/core/box/hak_kpi_util.inc.h index 0ec38aaa..d2f5a594 100644 --- a/core/box/hak_kpi_util.inc.h +++ b/core/box/hak_kpi_util.inc.h @@ -19,9 +19,10 @@ static void get_page_faults(uint64_t* soft_pf, uint64_t* hard_pf) { if (!f) { *soft_pf = 0; *hard_pf = 0; return; } unsigned long minflt = 0, majflt = 0; unsigned long dummy; char comm[256], state; - (void)fscanf(f, "%lu %s %c %lu %lu %lu %lu %lu %lu %lu %lu %lu", - &dummy, comm, &state, &dummy, &dummy, &dummy, &dummy, &dummy, - &dummy, &minflt, &dummy, &majflt); + int stat_ret = fscanf(f, "%lu %s %c %lu %lu %lu %lu %lu %lu %lu %lu %lu", + &dummy, comm, &state, &dummy, &dummy, &dummy, &dummy, &dummy, + &dummy, &minflt, &dummy, &majflt); + (void)stat_ret; fclose(f); *soft_pf = minflt; *hard_pf = majflt; } @@ -30,7 +31,10 @@ static void get_page_faults(uint64_t* soft_pf, uint64_t* hard_pf) { static uint64_t get_rss_kb(void) { FILE* f = fopen("/proc/self/statm", "r"); if (!f) return 0; - unsigned long size, resident; (void)fscanf(f, "%lu %lu", &size, &resident); fclose(f); + unsigned long size, resident; + int statm_ret = fscanf(f, "%lu %lu", &size, &resident); + (void)statm_ret; + fclose(f); long page_size = sysconf(_SC_PAGESIZE); return (resident * page_size) / 1024; // Convert to KB } @@ -69,4 +73,3 @@ void hak_get_kpi(hak_kpi_t* out) { memset(out, 0, sizeof(hak_kpi_t)); } #endif #endif // HAK_KPI_UTIL_INC_H - diff --git a/core/box/hak_wrappers.inc.h b/core/box/hak_wrappers.inc.h index 49c712d0..658eaf59 100644 --- a/core/box/hak_wrappers.inc.h +++ b/core/box/hak_wrappers.inc.h @@ -74,13 +74,18 @@ typedef enum { static _Atomic uint64_t g_fb_counts[FB_REASON_COUNT]; static _Atomic int g_fb_log_count[FB_REASON_COUNT]; +static inline void wrapper_trace_write(const char* msg, size_t len) { + ssize_t w = write(2, msg, len); + (void)w; +} + static inline void wrapper_record_fallback(wrapper_fb_reason_t reason, const char* msg) { atomic_fetch_add_explicit(&g_fb_counts[reason], 1, memory_order_relaxed); const wrapper_env_cfg_t* wcfg = wrapper_env_cfg(); if (__builtin_expect(wcfg->wrap_diag, 0)) { int n = atomic_fetch_add_explicit(&g_fb_log_count[reason], 1, memory_order_relaxed); if (n < 4 && msg) { - write(2, msg, strlen(msg)); + wrapper_trace_write(msg, strlen(msg)); } } } @@ -123,7 +128,7 @@ void* malloc(size_t size) { g_hakmem_lock_depth++; // Debug step trace for 33KB: gated by env HAKMEM_STEP_TRACE (default: OFF) const wrapper_env_cfg_t* wcfg = wrapper_env_cfg(); - if (wcfg->step_trace && size == 33000) write(2, "STEP:1 Lock++\n", 14); + if (wcfg->step_trace && size == 33000) wrapper_trace_write("STEP:1 Lock++\n", 14); // Guard against recursion during initialization int init_wait = hak_init_wait_for_ready(); @@ -131,7 +136,7 @@ void* malloc(size_t size) { wrapper_record_fallback(FB_INIT_WAIT_FAIL, "[wrap] libc malloc: init_wait\n"); g_hakmem_lock_depth--; extern void* __libc_malloc(size_t); - if (size == 33000) write(2, "RET:Initializing\n", 17); + if (size == 33000) wrapper_trace_write("RET:Initializing\n", 17); return __libc_malloc(size); } @@ -147,21 +152,21 @@ void* malloc(size_t size) { wrapper_record_fallback(FB_FORCE_LIBC, "[wrap] libc malloc: force_libc\n"); g_hakmem_lock_depth--; extern void* __libc_malloc(size_t); - if (wcfg->step_trace && size == 33000) write(2, "RET:ForceLibc\n", 14); + if (wcfg->step_trace && size == 33000) wrapper_trace_write("RET:ForceLibc\n", 14); return __libc_malloc(size); } - if (wcfg->step_trace && size == 33000) write(2, "STEP:2 ForceLibc passed\n", 24); + if (wcfg->step_trace && size == 33000) wrapper_trace_write("STEP:2 ForceLibc passed\n", 24); int ld_mode = hak_ld_env_mode(); if (ld_mode) { - if (wcfg->step_trace && size == 33000) write(2, "STEP:3 LD Mode\n", 15); + if (wcfg->step_trace && size == 33000) wrapper_trace_write("STEP:3 LD Mode\n", 15); // BUG FIX: g_jemalloc_loaded == -1 (unknown) should not trigger fallback // Only fallback if jemalloc is ACTUALLY loaded (> 0) if (hak_ld_block_jemalloc() && g_jemalloc_loaded > 0) { wrapper_record_fallback(FB_JEMALLOC_BLOCK, "[wrap] libc malloc: jemalloc block\n"); g_hakmem_lock_depth--; extern void* __libc_malloc(size_t); - if (wcfg->step_trace && size == 33000) write(2, "RET:Jemalloc\n", 13); + if (wcfg->step_trace && size == 33000) wrapper_trace_write("RET:Jemalloc\n", 13); return __libc_malloc(size); } if (!g_initialized) { hak_init(); } @@ -170,7 +175,7 @@ void* malloc(size_t size) { wrapper_record_fallback(FB_INIT_LD_WAIT_FAIL, "[wrap] libc malloc: ld init_wait\n"); g_hakmem_lock_depth--; extern void* __libc_malloc(size_t); - if (wcfg->step_trace && size == 33000) write(2, "RET:Init2\n", 10); + if (wcfg->step_trace && size == 33000) wrapper_trace_write("RET:Init2\n", 10); return __libc_malloc(size); } // Cache HAKMEM_LD_SAFE to avoid repeated getenv on hot path @@ -178,11 +183,11 @@ void* malloc(size_t size) { wrapper_record_fallback(FB_LD_SAFE, "[wrap] libc malloc: ld_safe\n"); g_hakmem_lock_depth--; extern void* __libc_malloc(size_t); - if (wcfg->step_trace && size == 33000) write(2, "RET:LDSafe\n", 11); + if (wcfg->step_trace && size == 33000) wrapper_trace_write("RET:LDSafe\n", 11); return __libc_malloc(size); } } - if (wcfg->step_trace && size == 33000) write(2, "STEP:4 LD Check passed\n", 23); + if (wcfg->step_trace && size == 33000) wrapper_trace_write("STEP:4 LD Check passed\n", 23); // Phase 26: CRITICAL - Ensure initialization before fast path // (fast path bypasses hak_alloc_at, so we need to init here) @@ -196,21 +201,21 @@ void* malloc(size_t size) { // Phase 4-Step3: Use config macro for compile-time optimization // Phase 7-Step1: Changed expect hint from 0→1 (unified path is now LIKELY) if (__builtin_expect(TINY_FRONT_UNIFIED_GATE_ENABLED, 1)) { - if (wcfg->step_trace && size == 33000) write(2, "STEP:5 Unified Gate check\n", 26); + if (wcfg->step_trace && size == 33000) wrapper_trace_write("STEP:5 Unified Gate check\n", 26); if (size <= tiny_get_max_size()) { - if (wcfg->step_trace && size == 33000) write(2, "STEP:5.1 Inside Unified\n", 24); + if (wcfg->step_trace && size == 33000) wrapper_trace_write("STEP:5.1 Inside Unified\n", 24); // Tiny Alloc Gate Box: malloc_tiny_fast() の薄いラッパ // (診断 OFF 時は従来どおりの挙動・コスト) void* ptr = tiny_alloc_gate_fast(size); if (__builtin_expect(ptr != NULL, 1)) { g_hakmem_lock_depth--; - if (wcfg->step_trace && size == 33000) write(2, "RET:TinyFast\n", 13); + if (wcfg->step_trace && size == 33000) wrapper_trace_write("RET:TinyFast\n", 13); return ptr; } // Unified Cache miss → fallback to normal path (hak_alloc_at) } } - if (wcfg->step_trace && size == 33000) write(2, "STEP:6 All checks passed\n", 25); + if (wcfg->step_trace && size == 33000) wrapper_trace_write("STEP:6 All checks passed\n", 25); #if !HAKMEM_BUILD_RELEASE if (count > 14250 && count < 14280 && size <= 1024) { diff --git a/core/box/init_bench_preset_box.h b/core/box/init_bench_preset_box.h new file mode 100644 index 00000000..1c6dc2b2 --- /dev/null +++ b/core/box/init_bench_preset_box.h @@ -0,0 +1,14 @@ +// init_bench_preset_box.h — ベンチ用プリセットの箱 +#ifndef INIT_BENCH_PRESET_BOX_H +#define INIT_BENCH_PRESET_BOX_H + +#include + +static inline void box_init_bench_presets(void) { + const char* bt = getenv("HAKMEM_BENCH_TINY_ONLY"); + if (bt && atoi(bt) != 0) { + g_bench_tiny_only = 1; + } +} + +#endif // INIT_BENCH_PRESET_BOX_H diff --git a/core/box/init_diag_box.h b/core/box/init_diag_box.h new file mode 100644 index 00000000..029ad7fc --- /dev/null +++ b/core/box/init_diag_box.h @@ -0,0 +1,71 @@ +// init_diag_box.h — 初期化時の診断(SIGSEGV ハンドラ、ベースライン、ビルドバナー) +#ifndef INIT_DIAG_BOX_H +#define INIT_DIAG_BOX_H + +#include +#include +#include + +// Debug-only SIGSEGV handler (gated by HAKMEM_DEBUG_SEGV) +static inline void box_diag_install_sigsegv_handler(void (*handler)(int)) { + const char* dbg = getenv("HAKMEM_DEBUG_SEGV"); + if (!dbg || atoi(dbg) == 0) return; + + struct sigaction sa; + memset(&sa, 0, sizeof(sa)); + sa.sa_flags = SA_RESETHAND; + sa.sa_handler = handler; + sigaction(SIGSEGV, &sa, NULL); +} + +static inline void box_diag_record_baseline(void) { +#ifdef __linux__ + memset(g_latency_histogram, 0, sizeof(g_latency_histogram)); + g_latency_samples = 0; + + get_page_faults(&g_baseline_soft_pf, &g_baseline_hard_pf); + g_baseline_rss_kb = get_rss_kb(); + + HAKMEM_LOG("Baseline: soft_pf=%lu, hard_pf=%lu, rss=%lu KB\n", + (unsigned long)g_baseline_soft_pf, + (unsigned long)g_baseline_hard_pf, + (unsigned long)g_baseline_rss_kb); +#endif +} + +static inline void box_diag_print_banner(void) { + const char* bf = "UNKNOWN"; +#ifdef HAKMEM_BUILD_RELEASE + bf = "RELEASE"; +#elif defined(HAKMEM_BUILD_DEBUG) + bf = "DEBUG"; +#endif + (void)bf; + HAKMEM_LOG( + "[Build] Flavor=%s Flags: HEADER_CLASSIDX=%d, AGGRESSIVE_INLINE=%d, " + "POOL_TLS_PHASE1=%d, POOL_TLS_PREWARM=%d\n", + bf, +#if HAKMEM_TINY_HEADER_CLASSIDX + 1, +#else + 0, +#endif +#ifdef HAKMEM_TINY_AGGRESSIVE_INLINE + 1, +#else + 0, +#endif +#ifdef HAKMEM_POOL_TLS_PHASE1 + 1, +#else + 0, +#endif +#ifdef HAKMEM_POOL_TLS_PREWARM + 1 +#else + 0 +#endif + ); +} + +#endif // INIT_DIAG_BOX_H diff --git a/core/box/init_env_box.h b/core/box/init_env_box.h new file mode 100644 index 00000000..866aa16d --- /dev/null +++ b/core/box/init_env_box.h @@ -0,0 +1,87 @@ +// init_env_box.h — ENV 読み出しと初期フラグ設定の箱 +#ifndef INIT_ENV_BOX_H +#define INIT_ENV_BOX_H + +#include +#include + +static inline void box_init_env_flags(void) { + // Phase 6.15: EVO サンプリング(デフォルト OFF) + const char* evo_sample_str = getenv("HAKMEM_EVO_SAMPLE"); + if (evo_sample_str && atoi(evo_sample_str) > 0) { + int freq = atoi(evo_sample_str); + if (freq >= 64) { + HAKMEM_LOG("Warning: HAKMEM_EVO_SAMPLE=%d too large, using 63\n", freq); + freq = 63; + } + g_evo_sample_mask = (1ULL << freq) - 1; + HAKMEM_LOG("EVO sampling enabled: every 2^%d = %llu calls\n", + freq, (unsigned long long)(g_evo_sample_mask + 1)); + } else { + g_evo_sample_mask = 0; // Disabled by default + HAKMEM_LOG("EVO sampling disabled (HAKMEM_EVO_SAMPLE not set or 0)\n"); + } + + // LD_PRELOAD 配下のセーフモード + { + const char* ldpre = getenv("LD_PRELOAD"); + if (ldpre && strstr(ldpre, "libhakmem.so")) { + g_ldpreload_mode = 1; + // Default LD-safe mode if not set: 1 (Tiny-only) + const char* lds = getenv("HAKMEM_LD_SAFE"); + if (lds) { /* NOP used in wrappers */ } else { setenv("HAKMEM_LD_SAFE", "1", 0); } + if (!getenv("HAKMEM_TINY_TLS_SLL")) { + setenv("HAKMEM_TINY_TLS_SLL", "0", 0); // disable TLS SLL by default + } + if (!getenv("HAKMEM_TINY_USE_SUPERSLAB")) { + setenv("HAKMEM_TINY_USE_SUPERSLAB", "0", 0); // disable SuperSlab path by default + } + } + } + + // Runtime safety toggle + const char* safe_free_env = getenv("HAKMEM_SAFE_FREE"); + if (safe_free_env && atoi(safe_free_env) != 0) { + g_strict_free = 1; + HAKMEM_LOG("Strict free safety enabled (HAKMEM_SAFE_FREE=1)\n"); + } else { + // Heuristic: if loaded via LD_PRELOAD, enable strict free by default + const char* ldpre = getenv("LD_PRELOAD"); + if (ldpre && strstr(ldpre, "libhakmem.so")) { + g_ldpreload_mode = 1; + g_strict_free = 1; + HAKMEM_LOG("Strict free safety auto-enabled under LD_PRELOAD\n"); + } + } + + // Invalid free logging toggle (default off to avoid spam under LD_PRELOAD) + const char* invlog = getenv("HAKMEM_INVALID_FREE_LOG"); + if (invlog && atoi(invlog) != 0) { + g_invalid_free_log = 1; + HAKMEM_LOG("Invalid free logging enabled (HAKMEM_INVALID_FREE_LOG=1)\n"); + } + + // Phase 7.4: Cache HAKMEM_INVALID_FREE to eliminate getenv overhead + const char* inv = getenv("HAKMEM_INVALID_FREE"); + if (inv && strcmp(inv, "skip") == 0) { + g_invalid_free_mode = 1; // explicit opt-in to legacy skip mode + HAKMEM_LOG("Invalid free mode: skip check (HAKMEM_INVALID_FREE=skip)\n"); + } else if (inv && strcmp(inv, "fallback") == 0) { + g_invalid_free_mode = 0; // fallback mode: route invalid frees to libc + HAKMEM_LOG("Invalid free mode: fallback to libc (HAKMEM_INVALID_FREE=fallback)\n"); + } else { + // Under LD_PRELOAD, prefer safety: default to fallback unless explicitly overridden + const char* ldpre = getenv("LD_PRELOAD"); + if (ldpre && strstr(ldpre, "libhakmem.so")) { + g_ldpreload_mode = 1; + g_invalid_free_mode = 0; + HAKMEM_LOG("Invalid free mode: fallback to libc (auto under LD_PRELOAD)\n"); + } else { + // Default: safety first (fallback), avoids routing unknown pointers into Tiny + g_invalid_free_mode = 0; + HAKMEM_LOG("Invalid free mode: fallback to libc (default)\n"); + } + } +} + +#endif // INIT_ENV_BOX_H diff --git a/core/box/libm_reloc_guard_box.c b/core/box/libm_reloc_guard_box.c new file mode 100644 index 00000000..4c8154ff --- /dev/null +++ b/core/box/libm_reloc_guard_box.c @@ -0,0 +1,190 @@ +// libm_reloc_guard_box.c - Box: libm .fini relocation guard +#include "libm_reloc_guard_box.h" +#include "log_once_box.h" + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#if defined(__linux__) && defined(__x86_64__) + +typedef struct { + uintptr_t base; + int patched; +} libm_reloc_ctx_t; + +static hak_log_once_t g_libm_log_once = HAK_LOG_ONCE_INIT; +static hak_log_once_t g_libm_patch_once = HAK_LOG_ONCE_INIT; +static hak_log_once_t g_libm_fail_once = HAK_LOG_ONCE_INIT; +static _Atomic int g_libm_guard_ran = 0; + +static int libm_reloc_env(const char* name, int default_on) { + const char* e = getenv(name); + if (!e || *e == '\0') { + return default_on; + } + return (*e != '0') ? 1 : 0; +} + +int libm_reloc_guard_enabled(void) { + static int enabled = -1; + if (__builtin_expect(enabled == -1, 0)) { + enabled = libm_reloc_env("HAKMEM_LIBM_RELOC_GUARD", 1); + } + return enabled; +} + +static int libm_reloc_guard_quiet(void) { + static int quiet = -1; + if (__builtin_expect(quiet == -1, 0)) { + quiet = libm_reloc_env("HAKMEM_LIBM_RELOC_GUARD_QUIET", 0); + } + return quiet; +} + +static int libm_reloc_patch_enabled(void) { + static int patch = -1; + if (__builtin_expect(patch == -1, 0)) { + patch = libm_reloc_env("HAKMEM_LIBM_RELOC_PATCH", 1); + } + return patch; +} + +static int libm_relocate_cb(struct dl_phdr_info* info, size_t size, void* data) { + (void)size; + libm_reloc_ctx_t* ctx = (libm_reloc_ctx_t*)data; + if ((uintptr_t)info->dlpi_addr != ctx->base) { + return 0; + } + + ElfW(Addr) rela_off = 0; + ElfW(Xword) rela_sz = 0; + ElfW(Xword) rela_ent = sizeof(ElfW(Rela)); + uintptr_t relro_start = 0; + size_t relro_size = 0; + + for (ElfW(Half) i = 0; i < info->dlpi_phnum; i++) { + const ElfW(Phdr)* ph = &info->dlpi_phdr[i]; + if (ph->p_type == PT_DYNAMIC) { + const ElfW(Dyn)* dyn = (const ElfW(Dyn)*)(info->dlpi_addr + ph->p_vaddr); + for (; dyn->d_tag != DT_NULL; ++dyn) { + switch (dyn->d_tag) { + case DT_RELA: rela_off = dyn->d_un.d_ptr; break; + case DT_RELASZ: rela_sz = dyn->d_un.d_val; break; + case DT_RELAENT: rela_ent = dyn->d_un.d_val; break; + default: break; + } + } + } else if (ph->p_type == PT_GNU_RELRO) { + relro_start = info->dlpi_addr + ph->p_vaddr; + relro_size = ph->p_memsz; + } + } + + if (rela_off == 0 || rela_sz == 0) { + return 1; + } + + size_t page_sz = (size_t)sysconf(_SC_PAGESIZE); + uintptr_t start = relro_start ? (relro_start & ~(page_sz - 1)) : 0; + size_t len = 0; + if (relro_size) { + size_t tail = (relro_start - start) + relro_size; + len = (tail + page_sz - 1) & ~(page_sz - 1); + (void)mprotect((void*)start, len, PROT_READ | PROT_WRITE); + } + + ElfW(Rela)* rela = (ElfW(Rela)*)(ctx->base + rela_off); + size_t count = rela_ent ? (rela_sz / rela_ent) : 0; + for (size_t i = 0; i < count; i++) { + if (ELF64_R_TYPE(rela[i].r_info) == R_X86_64_RELATIVE) { + ElfW(Addr)* slot = (ElfW(Addr)*)(ctx->base + rela[i].r_offset); + *slot = ctx->base + rela[i].r_addend; + } + } + + if (len) { + (void)mprotect((void*)start, len, PROT_READ); + } + ctx->patched = 1; + return 1; +} + +static int libm_reloc_apply(uintptr_t base) { + libm_reloc_ctx_t ctx = {.base = base, .patched = 0}; + dl_iterate_phdr(libm_relocate_cb, &ctx); + return ctx.patched; +} + +void libm_reloc_guard_run(void) { + if (!libm_reloc_guard_enabled()) { + return; + } + if (atomic_exchange_explicit(&g_libm_guard_ran, 1, memory_order_relaxed)) { + return; + } + + bool quiet = libm_reloc_guard_quiet() != 0; + Dl_info di = {0}; + if (dladdr((void*)&cos, &di) == 0 || di.dli_fbase == NULL) { + hak_log_once_fprintf(&g_libm_fail_once, quiet, stderr, "[LIBM_RELOC_GUARD] dladdr(libm) failed\n"); + return; + } + + const uintptr_t base = (uintptr_t)di.dli_fbase; + const uintptr_t fini_off = 0xe5d88; // observed .fini_array[0] offset in libm.so.6 + uintptr_t* fini_slot = (uintptr_t*)(base + fini_off); + uintptr_t raw = *fini_slot; + bool relocated = raw >= base; + + hak_log_once_fprintf(&g_libm_log_once, + quiet, + stderr, + "[LIBM_RELOC_GUARD] base=%p slot=%p raw=%p relocated=%d\n", + (void*)di.dli_fbase, + (void*)fini_slot, + (void*)raw, + relocated ? 1 : 0); + + if (relocated) { + return; + } + + if (!libm_reloc_patch_enabled()) { + hak_log_once_fprintf(&g_libm_patch_once, + quiet, + stderr, + "[LIBM_RELOC_GUARD] unrelocated .fini_array detected (raw=%p); patch disabled\n", + (void*)raw); + return; + } + + int patched = libm_reloc_apply(base); + if (patched) { + hak_log_once_fprintf(&g_libm_patch_once, + quiet, + stderr, + "[LIBM_RELOC_GUARD] relocated libm .rela.dyn (base=%p)\n", + (void*)di.dli_fbase); + } else { + hak_log_once_fprintf(&g_libm_fail_once, + quiet, + stderr, + "[LIBM_RELOC_GUARD] failed to relocate libm (base=%p)\n", + (void*)di.dli_fbase); + } +} + +#else // non-linux/x86_64 + +int libm_reloc_guard_enabled(void) { return 0; } +void libm_reloc_guard_run(void) {} + +#endif diff --git a/core/box/libm_reloc_guard_box.h b/core/box/libm_reloc_guard_box.h new file mode 100644 index 00000000..415923f9 --- /dev/null +++ b/core/box/libm_reloc_guard_box.h @@ -0,0 +1,11 @@ +// libm_reloc_guard_box.h - Box: libm .fini relocation guard (one-shot) +// Purpose: detect (and optionally patch) unrelocated libm .fini_array at init +// Controls: HAKMEM_LIBM_RELOC_GUARD (default: on), HAKMEM_LIBM_RELOC_GUARD_QUIET, +// HAKMEM_LIBM_RELOC_PATCH (default: on; set 0 to log-only) +#ifndef HAKMEM_LIBM_RELOC_GUARD_BOX_H +#define HAKMEM_LIBM_RELOC_GUARD_BOX_H + +int libm_reloc_guard_enabled(void); +void libm_reloc_guard_run(void); + +#endif // HAKMEM_LIBM_RELOC_GUARD_BOX_H diff --git a/core/box/log_once_box.h b/core/box/log_once_box.h new file mode 100644 index 00000000..9bdbfb53 --- /dev/null +++ b/core/box/log_once_box.h @@ -0,0 +1,41 @@ +// log_once_box.h - Simple one-shot logging helpers (Box) +// Provides: lightweight, thread-safe "log once" primitives for stderr/write +// Used by: guard boxes that need single notification without spamming +#ifndef HAKMEM_LOG_ONCE_BOX_H +#define HAKMEM_LOG_ONCE_BOX_H + +#include +#include +#include +#include +#include +#include + +typedef struct { + _Atomic int logged; +} hak_log_once_t; + +#define HAK_LOG_ONCE_INIT {0} + +static inline bool hak_log_once_should_log(hak_log_once_t* flag, bool quiet) { + if (quiet) return false; + if (!flag) return true; + return atomic_exchange_explicit(&flag->logged, 1, memory_order_relaxed) == 0; +} + +static inline void hak_log_once_write(hak_log_once_t* flag, bool quiet, int fd, const char* buf, size_t len) { + if (!buf) return; + if (!hak_log_once_should_log(flag, quiet)) return; + (void)write(fd, buf, len); +} + +static inline void hak_log_once_fprintf(hak_log_once_t* flag, bool quiet, FILE* stream, const char* fmt, ...) { + if (!stream || !fmt) return; + if (!hak_log_once_should_log(flag, quiet)) return; + va_list ap; + va_start(ap, fmt); + (void)vfprintf(stream, fmt, ap); + va_end(ap); +} + +#endif // HAKMEM_LOG_ONCE_BOX_H diff --git a/core/box/madvise_guard_box.c b/core/box/madvise_guard_box.c new file mode 100644 index 00000000..1fee4dd2 --- /dev/null +++ b/core/box/madvise_guard_box.c @@ -0,0 +1,107 @@ +// madvise_guard_box.c - Box: Safe madvise wrapper with DSO guard +#include "madvise_guard_box.h" +#include "ss_os_acquire_box.h" +#include "log_once_box.h" + +#include +#include +#include +#include +#include +#include +#include +#include + +#if !HAKMEM_BUILD_RELEASE +static hak_log_once_t g_madvise_bad_ptr_once = HAK_LOG_ONCE_INIT; +static hak_log_once_t g_madvise_enomem_once = HAK_LOG_ONCE_INIT; +#endif + +static int ss_madvise_guard_env(const char* name, int default_on) { + const char* e = getenv(name); + if (!e || *e == '\0') { + return default_on; + } + return (*e != '0') ? 1 : 0; +} + +int ss_madvise_guard_enabled(void) { + static int enabled = -1; + if (__builtin_expect(enabled == -1, 0)) { + enabled = ss_madvise_guard_env("HAKMEM_SS_MADVISE_GUARD", 1); + } + return enabled; +} + +int ss_madvise_guard_quiet_logs(void) { + static int quiet = -1; + if (__builtin_expect(quiet == -1, 0)) { + quiet = ss_madvise_guard_env("HAKMEM_SS_MADVISE_GUARD_QUIET", 0); + } + return quiet; +} + +int ss_os_madvise_guarded(void* ptr, size_t len, int advice, const char* where) { + (void)where; + if (!ptr || len == 0) { + return 0; + } + +#if !HAKMEM_BUILD_RELEASE + bool quiet = ss_madvise_guard_quiet_logs() != 0; +#endif + + // Guard can be turned off via env for A/B testing. + if (!ss_madvise_guard_enabled()) { + int ret = madvise(ptr, len, advice); + ss_os_stats_record_madvise(); + return ret; + } + + Dl_info dli = {0}; + if (dladdr(ptr, &dli) != 0 && dli.dli_fname != NULL) { +#if !HAKMEM_BUILD_RELEASE + hak_log_once_fprintf(&g_madvise_bad_ptr_once, + quiet, + stderr, + "[SS_MADVISE_GUARD] skip ptr=%p len=%zu owner=%s\n", + ptr, + len, + dli.dli_fname); +#endif + return 0; + } + + if (atomic_load_explicit(&g_ss_madvise_disabled, memory_order_relaxed)) { + return 0; + } + + int ret = madvise(ptr, len, advice); + ss_os_stats_record_madvise(); + if (ret == 0) { + return 0; + } + + int e = errno; + if (e == ENOMEM) { + atomic_fetch_add_explicit(&g_ss_os_madvise_fail_enomem, 1, memory_order_relaxed); + atomic_store_explicit(&g_ss_madvise_disabled, true, memory_order_relaxed); +#if !HAKMEM_BUILD_RELEASE + hak_log_once_fprintf(&g_madvise_enomem_once, + quiet, + stderr, + "[SS_OS_MADVISE] madvise(advice=%d, ptr=%p, len=%zu) failed with ENOMEM; disabling further madvise\n", + advice, + ptr, + len); +#endif + return 0; // soft fail, do not propagate ENOMEM + } + + atomic_fetch_add_explicit(&g_ss_os_madvise_fail_other, 1, memory_order_relaxed); + errno = e; + if (e == EINVAL) { + return -1; // let caller handle strict mode + } + return 0; +} diff --git a/core/box/madvise_guard_box.h b/core/box/madvise_guard_box.h new file mode 100644 index 00000000..b86e244a --- /dev/null +++ b/core/box/madvise_guard_box.h @@ -0,0 +1,22 @@ +// madvise_guard_box.h - Box: Safe madvise wrapper with DSO guard +// Responsibility: guard madvise() against DSO/text pointers and handle ENOMEM once +// Controls: HAKMEM_SS_MADVISE_GUARD (default: on), HAKMEM_SS_MADVISE_GUARD_QUIET +#ifndef HAKMEM_MADVISE_GUARD_BOX_H +#define HAKMEM_MADVISE_GUARD_BOX_H + +#include + +// Returns 1 when guard is enabled (default), 0 when disabled via env. +int ss_madvise_guard_enabled(void); + +// Returns 1 when guard logging is silenced (HAKMEM_SS_MADVISE_GUARD_QUIET != 0). +int ss_madvise_guard_quiet_logs(void); + +// Guarded madvise: +// - Skips DSO/text addresses (dladdr hit) to avoid touching .fini_array +// - ENOMEM: disables future madvise calls (soft fail) +// - EINVAL: returns -1 so caller can honor STRICT mode +// - Other errors: increments counters, returns 0 +int ss_os_madvise_guarded(void* ptr, size_t len, int advice, const char* where); + +#endif // HAKMEM_MADVISE_GUARD_BOX_H diff --git a/core/box/mailbox_box.c b/core/box/mailbox_box.c index 8ab99f56..af1ce926 100644 --- a/core/box/mailbox_box.c +++ b/core/box/mailbox_box.c @@ -17,12 +17,12 @@ static _Atomic(uintptr_t) g_pub_mailbox_entries[TINY_NUM_CLASSES][MAILBOX_SHARDS static _Atomic(uint32_t) g_pub_mailbox_claimed[TINY_NUM_CLASSES][MAILBOX_SHARDS]; static _Atomic(uint32_t) g_pub_mailbox_rr[TINY_NUM_CLASSES]; static _Atomic(uint32_t) g_pub_mailbox_used[TINY_NUM_CLASSES]; -static _Atomic(uint32_t) g_pub_mailbox_scan[TINY_NUM_CLASSES]; +static _Atomic(uint32_t) g_pub_mailbox_scan[TINY_NUM_CLASSES] __attribute__((unused)); static __thread uint8_t g_tls_mailbox_registered[TINY_NUM_CLASSES]; static __thread uint8_t g_tls_mailbox_slot[TINY_NUM_CLASSES]; static int g_mailbox_trace_en = -1; -static int g_mailbox_trace_limit = 4; -static _Atomic int g_mailbox_trace_seen[TINY_NUM_CLASSES]; +static int g_mailbox_trace_limit __attribute__((unused)) = 4; +static _Atomic int g_mailbox_trace_seen[TINY_NUM_CLASSES] __attribute__((unused)); // Optional: periodic slow discovery to widen 'used' even when >0 (A/B) static int g_mailbox_slowdisc_en = -1; // env: HAKMEM_TINY_MAILBOX_SLOWDISC (default ON) static int g_mailbox_slowdisc_period = -1; // env: HAKMEM_TINY_MAILBOX_SLOWDISC_PERIOD (default 256) @@ -159,6 +159,9 @@ uintptr_t mailbox_box_peek_one(int class_idx) { } #endif + (void)slow_en; + (void)period; + // Non-destructive peek of first non-zero entry uint32_t used = atomic_load_explicit(&g_pub_mailbox_used[class_idx], memory_order_acquire); for (uint32_t i = 0; i < used; i++) { diff --git a/core/box/mailbox_box.d b/core/box/mailbox_box.d index 6fb77bb8..a2add4fd 100644 --- a/core/box/mailbox_box.d +++ b/core/box/mailbox_box.d @@ -3,7 +3,12 @@ core/box/mailbox_box.o: core/box/mailbox_box.c core/box/mailbox_box.h \ core/hakmem_tiny_superslab_constants.h core/superslab/superslab_inline.h \ core/superslab/superslab_types.h core/superslab/../tiny_box_geometry.h \ core/superslab/../hakmem_tiny_superslab_constants.h \ - core/superslab/../hakmem_tiny_config.h core/tiny_debug_ring.h \ + core/superslab/../hakmem_tiny_config.h \ + core/superslab/../hakmem_super_registry.h \ + core/superslab/../hakmem_tiny_superslab.h \ + core/superslab/../box/ss_addr_map_box.h \ + core/superslab/../box/../hakmem_build_flags.h \ + core/superslab/../box/super_reg_box.h core/tiny_debug_ring.h \ core/hakmem_build_flags.h core/tiny_remote.h \ core/hakmem_tiny_superslab_constants.h core/hakmem_tiny.h \ core/hakmem_trace.h core/hakmem_tiny_mini_mag.h \ @@ -18,6 +23,11 @@ core/superslab/superslab_types.h: core/superslab/../tiny_box_geometry.h: core/superslab/../hakmem_tiny_superslab_constants.h: core/superslab/../hakmem_tiny_config.h: +core/superslab/../hakmem_super_registry.h: +core/superslab/../hakmem_tiny_superslab.h: +core/superslab/../box/ss_addr_map_box.h: +core/superslab/../box/../hakmem_build_flags.h: +core/superslab/../box/super_reg_box.h: core/tiny_debug_ring.h: core/hakmem_build_flags.h: core/tiny_remote.h: diff --git a/core/box/pool_api.inc.h b/core/box/pool_api.inc.h index da5c11ee..f9e52dfc 100644 --- a/core/box/pool_api.inc.h +++ b/core/box/pool_api.inc.h @@ -5,6 +5,7 @@ #include "pagefault_telemetry_box.h" // Box PageFaultTelemetry (PF_BUCKET_MID) #include "box/pool_hotbox_v2_box.h" #include "box/tiny_heap_env_box.h" // TinyHeap profile (C7_SAFE では flatten を無効化) +#include "box/pool_zero_mode_box.h" // Pool zeroing policy (env cached) // Pool v2 is experimental. Default OFF (use legacy v1 path). static inline int hak_pool_v2_enabled(void) { @@ -62,6 +63,7 @@ static inline int hak_pool_v1_flatten_stats_enabled(void) { return g; } + typedef struct PoolV1FlattenStats { _Atomic uint64_t alloc_tls_hit; _Atomic uint64_t alloc_fallback_v1; diff --git a/core/box/pool_mf2_helpers.inc.h b/core/box/pool_mf2_helpers.inc.h index 37367082..74925bed 100644 --- a/core/box/pool_mf2_helpers.inc.h +++ b/core/box/pool_mf2_helpers.inc.h @@ -28,6 +28,7 @@ static inline bool mf2_try_drain_to_partial(MF2_ThreadPages* tp, int class_idx, // Drain remote frees int drained = mf2_drain_remote_frees(page); + (void)drained; // If page has freelist after drain, add to partial list (LIFO) if (page->freelist) { @@ -102,6 +103,7 @@ static bool mf2_try_drain_active_remotes(MF2_ThreadPages* tp, int class_idx) { if (remote_cnt > 0) { atomic_fetch_add(&g_mf2_slow_found_remote, 1); int drained = mf2_drain_remote_frees(page); + (void)drained; if (drained > 0 && page->freelist) { atomic_fetch_add(&g_mf2_drain_success, 1); return true; // Success! Active page now has freelist diff --git a/core/box/pool_zero_mode_box.h b/core/box/pool_zero_mode_box.h new file mode 100644 index 00000000..d6dd58c3 --- /dev/null +++ b/core/box/pool_zero_mode_box.h @@ -0,0 +1,21 @@ +// pool_zero_mode_box.h — Box: Pool zeroing policy (env-cached) +#ifndef POOL_ZERO_MODE_BOX_H +#define POOL_ZERO_MODE_BOX_H + +#include "../hakmem_env_cache.h" // HAK_ENV_POOL_ZERO_MODE + +typedef enum { + POOL_ZERO_MODE_FULL = 0, + POOL_ZERO_MODE_HEADER = 1, + POOL_ZERO_MODE_OFF = 2, +} PoolZeroMode; + +static inline PoolZeroMode hak_pool_zero_mode(void) { + return (PoolZeroMode)HAK_ENV_POOL_ZERO_MODE(); +} + +static inline int hak_pool_zero_header_only(void) { + return hak_pool_zero_mode() == POOL_ZERO_MODE_HEADER; +} + +#endif // POOL_ZERO_MODE_BOX_H diff --git a/core/box/prewarm_box.d b/core/box/prewarm_box.d index 0e29d041..d85f171c 100644 --- a/core/box/prewarm_box.d +++ b/core/box/prewarm_box.d @@ -10,6 +10,11 @@ core/box/prewarm_box.o: core/box/prewarm_box.c core/box/../hakmem_tiny.h \ core/box/../superslab/../tiny_box_geometry.h \ core/box/../superslab/../hakmem_tiny_superslab_constants.h \ core/box/../superslab/../hakmem_tiny_config.h \ + core/box/../superslab/../hakmem_super_registry.h \ + core/box/../superslab/../hakmem_tiny_superslab.h \ + core/box/../superslab/../box/ss_addr_map_box.h \ + core/box/../superslab/../box/../hakmem_build_flags.h \ + core/box/../superslab/../box/super_reg_box.h \ core/box/../tiny_debug_ring.h core/box/../tiny_remote.h \ core/box/../hakmem_tiny_superslab_constants.h \ core/box/../hakmem_tiny_config.h core/box/../hakmem_tiny_superslab.h \ @@ -30,6 +35,11 @@ core/box/../superslab/superslab_types.h: core/box/../superslab/../tiny_box_geometry.h: core/box/../superslab/../hakmem_tiny_superslab_constants.h: core/box/../superslab/../hakmem_tiny_config.h: +core/box/../superslab/../hakmem_super_registry.h: +core/box/../superslab/../hakmem_tiny_superslab.h: +core/box/../superslab/../box/ss_addr_map_box.h: +core/box/../superslab/../box/../hakmem_build_flags.h: +core/box/../superslab/../box/super_reg_box.h: core/box/../tiny_debug_ring.h: core/box/../tiny_remote.h: core/box/../hakmem_tiny_superslab_constants.h: diff --git a/core/box/smallobject_hotbox_v3_box.h b/core/box/smallobject_hotbox_v3_box.h index d2fe609a..ba8bde41 100644 --- a/core/box/smallobject_hotbox_v3_box.h +++ b/core/box/smallobject_hotbox_v3_box.h @@ -55,6 +55,7 @@ typedef struct so_stats_class_v3 { _Atomic uint64_t alloc_fallback_v1; _Atomic uint64_t free_calls; _Atomic uint64_t free_fallback_v1; + _Atomic uint64_t page_of_fail; } so_stats_class_v3; // Stats helpers (defined in core/smallobject_hotbox_v3.c) @@ -65,6 +66,7 @@ void so_v3_record_alloc_refill(uint8_t ci); void so_v3_record_alloc_fallback(uint8_t ci); void so_v3_record_free_call(uint8_t ci); void so_v3_record_free_fallback(uint8_t ci); +void so_v3_record_page_of_fail(uint8_t ci); // TLS accessor (core/smallobject_hotbox_v3.c) so_ctx_v3* so_tls_get(void); @@ -72,3 +74,6 @@ so_ctx_v3* so_tls_get(void); // Hot path API (Phase B: stub → always fallback to v1) void* so_alloc(uint32_t class_idx); void so_free(uint32_t class_idx, void* ptr); + +// C7-only pointer membership check (read-only, no state change) +int smallobject_hotbox_v3_can_own_c7(void* ptr); diff --git a/core/box/smallobject_hotbox_v3_env_box.h b/core/box/smallobject_hotbox_v3_env_box.h index 6a6580f0..56937b8b 100644 --- a/core/box/smallobject_hotbox_v3_env_box.h +++ b/core/box/smallobject_hotbox_v3_env_box.h @@ -1,7 +1,8 @@ // smallobject_hotbox_v3_env_box.h - ENV gate for SmallObject HotHeap v3 // 役割: // - HAKMEM_SMALL_HEAP_V3_ENABLED / HAKMEM_SMALL_HEAP_V3_CLASSES をまとめて読む。 -// - デフォルトは C7-only ON(クラスマスク 0x80)。ENV で明示的に 0 を指定した場合のみ v3 を無効化。 +// - デフォルトは C7-only ON(クラスマスク 0x80)。bit7=C7、bit6=C6(research-only, デフォルト OFF)。 +// ENV で明示的に 0 を指定した場合のみ v3 を無効化。 #pragma once #include @@ -45,3 +46,7 @@ static inline int small_heap_v3_class_enabled(uint8_t class_idx) { static inline int small_heap_v3_c7_enabled(void) { return small_heap_v3_class_enabled(7); } + +static inline int small_heap_v3_c6_enabled(void) { + return small_heap_v3_class_enabled(6); +} diff --git a/core/box/ss_ace_box.h b/core/box/ss_ace_box.h index c044361d..5aa2bdc8 100644 --- a/core/box/ss_ace_box.h +++ b/core/box/ss_ace_box.h @@ -28,7 +28,7 @@ extern SuperSlabACEState g_ss_ace[TINY_NUM_CLASSES_SS]; // ACE-aware size selection -static inline uint8_t hak_tiny_superslab_next_lg(int class_idx); +uint8_t hak_tiny_superslab_next_lg(int class_idx); // Optional: runtime profile switch for ACE thresholds (index-based). // Profiles are defined in ss_ace_box.c and selected via env or this setter. diff --git a/core/box/ss_addr_map_box.c b/core/box/ss_addr_map_box.c index 94dea18b..65885ab1 100644 --- a/core/box/ss_addr_map_box.c +++ b/core/box/ss_addr_map_box.c @@ -34,12 +34,13 @@ static void free_entry(SSMapEntry* entry) { // Strategy: Mask lower bits based on SuperSlab size // Note: SuperSlab can be 512KB, 1MB, or 2MB // Solution: Try each alignment until we find a valid SuperSlab -static void* get_superslab_base(void* ptr, struct SuperSlab* ss) { +static __attribute__((unused)) void* get_superslab_base(void* ptr, struct SuperSlab* ss) { // SuperSlab stores its own size in header // For now, use conservative approach: align to minimum size (512KB) // Phase 9-1-2: Optimize with actual size from SuperSlab header uintptr_t addr = (uintptr_t)ptr; uintptr_t mask = ~((1UL << SUPERSLAB_LG_MIN) - 1); // 512KB mask + (void)ss; return (void*)(addr & mask); } diff --git a/core/box/ss_os_acquire_box.h b/core/box/ss_os_acquire_box.h index f93ecf61..1d05cd05 100644 --- a/core/box/ss_os_acquire_box.h +++ b/core/box/ss_os_acquire_box.h @@ -21,8 +21,8 @@ #include #include #include -#include -#include + +#include "madvise_guard_box.h" // ============================================================================ // Global Counters (for debugging/diagnostics) @@ -70,52 +70,6 @@ static inline void ss_os_stats_record_madvise(void) { } // ============================================================================ -// madvise guard (shared by Superslab hot/cold paths) -// ============================================================================ -// -static inline int ss_os_madvise_guarded(void* ptr, size_t len, int advice, const char* where) { - (void)where; - if (!ptr || len == 0) { - return 0; - } - - if (atomic_load_explicit(&g_ss_madvise_disabled, memory_order_relaxed)) { - return 0; - } - - int ret = madvise(ptr, len, advice); - ss_os_stats_record_madvise(); - if (ret == 0) { - return 0; - } - - int e = errno; - if (e == ENOMEM) { - atomic_fetch_add_explicit(&g_ss_os_madvise_fail_enomem, 1, memory_order_relaxed); - atomic_store_explicit(&g_ss_madvise_disabled, true, memory_order_relaxed); -#if !HAKMEM_BUILD_RELEASE - static _Atomic bool g_ss_madvise_enomem_logged = false; - bool already = atomic_exchange_explicit(&g_ss_madvise_enomem_logged, true, memory_order_relaxed); - if (!already) { - fprintf(stderr, - "[SS_OS_MADVISE] madvise(advice=%d, ptr=%p, len=%zu) failed with ENOMEM " - "(vm.max_map_count reached?). Disabling further madvise calls.\n", - advice, ptr, len); - } -#endif - return 0; // soft fail, do not propagate ENOMEM - } - - atomic_fetch_add_explicit(&g_ss_os_madvise_fail_other, 1, memory_order_relaxed); - if (e == EINVAL) { - errno = e; - return -1; // let caller decide (strict mode) - } - errno = e; - return 0; -} - -// ============================================================================ // HugePage Experiment (research-only) // ============================================================================ diff --git a/core/box/ss_release_guard_box.h b/core/box/ss_release_guard_box.h index 652dc170..a76917af 100644 --- a/core/box/ss_release_guard_box.h +++ b/core/box/ss_release_guard_box.h @@ -40,7 +40,7 @@ static inline bool ss_release_guard_slab_can_recycle(SuperSlab* ss, int slab_idx, TinySlabMeta* meta) { - (void)ss; + (void)ss; (void)slab_idx; if (!meta) return false; // Mirror slab_is_empty() from slab_recycling_box.h diff --git a/core/box/superslab_expansion_box.c b/core/box/superslab_expansion_box.c index 90469a6b..1417553c 100644 --- a/core/box/superslab_expansion_box.c +++ b/core/box/superslab_expansion_box.c @@ -7,6 +7,7 @@ #include "superslab_expansion_box.h" #include "../hakmem_tiny_superslab.h" // expand_superslab_head(), g_superslab_heads +#include "../hakmem_tiny_superslab_internal.h" #include "../hakmem_tiny_superslab_constants.h" // SUPERSLAB_SLAB0_DATA_OFFSET #include #include diff --git a/core/box/superslab_expansion_box.d b/core/box/superslab_expansion_box.d index 9568677c..fb3c94a2 100644 --- a/core/box/superslab_expansion_box.d +++ b/core/box/superslab_expansion_box.d @@ -9,9 +9,34 @@ core/box/superslab_expansion_box.o: core/box/superslab_expansion_box.c \ core/box/../superslab/../tiny_box_geometry.h \ core/box/../superslab/../hakmem_tiny_superslab_constants.h \ core/box/../superslab/../hakmem_tiny_config.h \ + core/box/../superslab/../hakmem_super_registry.h \ + core/box/../superslab/../hakmem_tiny_superslab.h \ + core/box/../superslab/../box/ss_addr_map_box.h \ + core/box/../superslab/../box/../hakmem_build_flags.h \ + core/box/../superslab/../box/super_reg_box.h \ core/box/../tiny_debug_ring.h core/box/../hakmem_build_flags.h \ core/box/../tiny_remote.h core/box/../hakmem_tiny_superslab_constants.h \ core/box/../hakmem_tiny_superslab.h \ + core/box/../hakmem_tiny_superslab_internal.h \ + core/box/../box/ss_hot_cold_box.h \ + core/box/../box/../superslab/superslab_types.h \ + core/box/../box/ss_allocation_box.h core/hakmem_tiny_superslab.h \ + core/box/../hakmem_debug_master.h core/box/../hakmem_tiny.h \ + core/box/../hakmem_trace.h core/box/../hakmem_tiny_mini_mag.h \ + core/box/../box/hak_lane_classify.inc.h core/box/../box/ptr_type_box.h \ + core/box/../hakmem_tiny_config.h core/box/../hakmem_shared_pool.h \ + core/box/../hakmem_internal.h core/box/../hakmem.h \ + core/box/../hakmem_config.h core/box/../hakmem_features.h \ + core/box/../hakmem_sys.h core/box/../hakmem_whale.h \ + core/box/../tiny_region_id.h core/box/../tiny_box_geometry.h \ + core/box/../ptr_track.h core/box/../tiny_debug_api.h \ + core/box/../hakmem_tiny_integrity.h core/box/../box/tiny_next_ptr_box.h \ + core/hakmem_tiny_config.h core/tiny_nextptr.h core/hakmem_build_flags.h \ + core/tiny_region_id.h core/superslab/superslab_inline.h \ + core/box/tiny_layout_box.h core/box/../hakmem_tiny_config.h \ + core/box/tiny_header_box.h core/box/../hakmem_build_flags.h \ + core/box/tiny_layout_box.h core/box/../tiny_region_id.h \ + core/box/../box/slab_freelist_atomic.h \ core/box/../hakmem_tiny_superslab_constants.h core/box/superslab_expansion_box.h: core/box/../superslab/superslab_types.h: @@ -24,9 +49,51 @@ core/box/../superslab/superslab_types.h: core/box/../superslab/../tiny_box_geometry.h: core/box/../superslab/../hakmem_tiny_superslab_constants.h: core/box/../superslab/../hakmem_tiny_config.h: +core/box/../superslab/../hakmem_super_registry.h: +core/box/../superslab/../hakmem_tiny_superslab.h: +core/box/../superslab/../box/ss_addr_map_box.h: +core/box/../superslab/../box/../hakmem_build_flags.h: +core/box/../superslab/../box/super_reg_box.h: core/box/../tiny_debug_ring.h: core/box/../hakmem_build_flags.h: core/box/../tiny_remote.h: core/box/../hakmem_tiny_superslab_constants.h: core/box/../hakmem_tiny_superslab.h: +core/box/../hakmem_tiny_superslab_internal.h: +core/box/../box/ss_hot_cold_box.h: +core/box/../box/../superslab/superslab_types.h: +core/box/../box/ss_allocation_box.h: +core/hakmem_tiny_superslab.h: +core/box/../hakmem_debug_master.h: +core/box/../hakmem_tiny.h: +core/box/../hakmem_trace.h: +core/box/../hakmem_tiny_mini_mag.h: +core/box/../box/hak_lane_classify.inc.h: +core/box/../box/ptr_type_box.h: +core/box/../hakmem_tiny_config.h: +core/box/../hakmem_shared_pool.h: +core/box/../hakmem_internal.h: +core/box/../hakmem.h: +core/box/../hakmem_config.h: +core/box/../hakmem_features.h: +core/box/../hakmem_sys.h: +core/box/../hakmem_whale.h: +core/box/../tiny_region_id.h: +core/box/../tiny_box_geometry.h: +core/box/../ptr_track.h: +core/box/../tiny_debug_api.h: +core/box/../hakmem_tiny_integrity.h: +core/box/../box/tiny_next_ptr_box.h: +core/hakmem_tiny_config.h: +core/tiny_nextptr.h: +core/hakmem_build_flags.h: +core/tiny_region_id.h: +core/superslab/superslab_inline.h: +core/box/tiny_layout_box.h: +core/box/../hakmem_tiny_config.h: +core/box/tiny_header_box.h: +core/box/../hakmem_build_flags.h: +core/box/tiny_layout_box.h: +core/box/../tiny_region_id.h: +core/box/../box/slab_freelist_atomic.h: core/box/../hakmem_tiny_superslab_constants.h: diff --git a/core/box/tiny_alloc_gate_box.h b/core/box/tiny_alloc_gate_box.h index 43c6b2e5..709ebebe 100644 --- a/core/box/tiny_alloc_gate_box.h +++ b/core/box/tiny_alloc_gate_box.h @@ -136,7 +136,7 @@ static inline int tiny_alloc_gate_validate(TinyAllocGateContext* ctx) // - malloc ラッパ (hak_wrappers) から呼ばれる Tiny fast alloc の入口。 // - ルーティングポリシーに基づき Tiny front / Pool fallback を振り分け、 // 診断 ON のときだけ返された USER ポインタに対して Bridge + Layout 検査を追加。 -static __attribute__((always_inline)) void* tiny_alloc_gate_fast(size_t size) +static inline void* tiny_alloc_gate_fast(size_t size) { int class_idx = hak_tiny_size_to_class(size); if (__builtin_expect(class_idx < 0 || class_idx >= TINY_NUM_CLASSES, 0)) { diff --git a/core/box/tiny_free_gate_box.h b/core/box/tiny_free_gate_box.h index 0017e87e..27886c82 100644 --- a/core/box/tiny_free_gate_box.h +++ b/core/box/tiny_free_gate_box.h @@ -128,7 +128,7 @@ static inline int tiny_free_gate_classify(void* user_ptr, TinyFreeGateContext* c // 戻り値: // 1: Fast Path で処理済み(TLS SLL 等に push 済み) // 0: Slow Path にフォールバックすべき(hak_tiny_free へ) -static __attribute__((always_inline)) int tiny_free_gate_try_fast(void* user_ptr) +static inline int tiny_free_gate_try_fast(void* user_ptr) { #if !HAKMEM_TINY_HEADER_CLASSIDX (void)user_ptr; diff --git a/core/box/tiny_front_cold_box.h b/core/box/tiny_front_cold_box.h index 2aba2a97..6afcfb45 100644 --- a/core/box/tiny_front_cold_box.h +++ b/core/box/tiny_front_cold_box.h @@ -54,8 +54,8 @@ // - Cache refill failure → NULL (fallback to normal path) // - Logs errors in debug builds // -__attribute__((noinline, cold)) -static inline void* tiny_cold_refill_and_alloc(int class_idx) { +__attribute__((noinline, cold, unused)) +static void* tiny_cold_refill_and_alloc(int class_idx) { // Refill cache from SuperSlab (batch allocation) // unified_cache_refill() returns first BASE block (wrapped) hak_base_ptr_t base = unified_cache_refill(class_idx); @@ -107,10 +107,13 @@ static inline void* tiny_cold_refill_and_alloc(int class_idx) { // - Called infrequently (~1-5% of frees) // - Batch drain amortizes cost (e.g., drain 32 objects) // -__attribute__((noinline, cold)) -static inline int tiny_cold_drain_and_free(int class_idx, void* base) { +__attribute__((noinline, cold, unused)) +static int tiny_cold_drain_and_free(int class_idx, void* base) { extern __thread TinyUnifiedCache g_unified_cache[]; TinyUnifiedCache* cache = &g_unified_cache[class_idx]; +#if HAKMEM_BUILD_RELEASE + (void)cache; +#endif // TODO: Implement batch drain logic // For now, just reject the free (caller falls back to normal path) @@ -141,8 +144,8 @@ static inline int tiny_cold_drain_and_free(int class_idx, void* base) { // Precondition: Error detected in hot/cold path // Postcondition: Error logged (debug only, zero overhead in release) // -__attribute__((noinline, cold)) -static inline void tiny_cold_report_error(int class_idx, const char* reason) { +__attribute__((noinline, cold, unused)) +static void tiny_cold_report_error(int class_idx, const char* reason) { #if !HAKMEM_BUILD_RELEASE fprintf(stderr, "[COLD_BOX_ERROR] class_idx=%d reason=%s\n", class_idx, reason); fflush(stderr); diff --git a/core/box/tiny_front_v3_env_box.h b/core/box/tiny_front_v3_env_box.h index 016df2cc..51f77740 100644 --- a/core/box/tiny_front_v3_env_box.h +++ b/core/box/tiny_front_v3_env_box.h @@ -25,22 +25,30 @@ typedef struct TinyFrontV3SizeClassEntry { extern TinyFrontV3Snapshot g_tiny_front_v3_snapshot; extern int g_tiny_front_v3_snapshot_ready; -// ENV gate: default OFF +// ENV gate: default ON (set HAKMEM_TINY_FRONT_V3_ENABLED=0 to disable) static inline bool tiny_front_v3_enabled(void) { static int g_enable = -1; if (__builtin_expect(g_enable == -1, 0)) { const char* e = getenv("HAKMEM_TINY_FRONT_V3_ENABLED"); - g_enable = (e && *e && *e != '0') ? 1 : 0; + if (e && *e) { + g_enable = (*e != '0') ? 1 : 0; + } else { + g_enable = 1; // default: ON + } } return g_enable != 0; } -// Optional: size→class LUT gate (default OFF, for A/B) +// Optional: size→class LUT gate (default ON, set HAKMEM_TINY_FRONT_V3_LUT_ENABLED=0 to disable) static inline bool tiny_front_v3_lut_enabled(void) { static int g = -1; if (__builtin_expect(g == -1, 0)) { const char* e = getenv("HAKMEM_TINY_FRONT_V3_LUT_ENABLED"); - g = (e && *e && *e != '0') ? 1 : 0; + if (e && *e) { + g = (*e != '0') ? 1 : 0; + } else { + g = 1; // default: ON + } } return g != 0; } @@ -55,6 +63,20 @@ static inline bool tiny_front_v3_route_fast_enabled(void) { return g != 0; } +// C7 v3 free 専用 ptr fast classify gate (default OFF) +static inline bool tiny_ptr_fast_classify_enabled(void) { + static int g = -1; + if (__builtin_expect(g == -1, 0)) { + const char* e = getenv("HAKMEM_TINY_PTR_FAST_CLASSIFY_ENABLED"); + if (e && *e) { + g = (*e != '0') ? 1 : 0; + } else { + g = 1; // default: ON (set =0 to disable) + } + } + return g != 0; +} + // Optional stats gate static inline bool tiny_front_v3_stats_enabled(void) { static int g = -1; diff --git a/core/box/tiny_page_box.h b/core/box/tiny_page_box.h index 3b06df4e..0e204813 100644 --- a/core/box/tiny_page_box.h +++ b/core/box/tiny_page_box.h @@ -161,7 +161,6 @@ static inline void tiny_page_box_on_new_slab(int class_idx, TinyTLSSlab* tls) SuperSlab* ss = tls->ss; TinySlabMeta* meta = tls->meta; uint8_t* base = tls->slab_base; - int slab_idx = (int)tls->slab_idx; if (!ss || !meta || !base) { return; diff --git a/core/box/tiny_tls_carve_one_block_box.h b/core/box/tiny_tls_carve_one_block_box.h index 59d31d10..be1ed0d7 100644 --- a/core/box/tiny_tls_carve_one_block_box.h +++ b/core/box/tiny_tls_carve_one_block_box.h @@ -40,7 +40,7 @@ tiny_tls_carve_one_block(TinyTLSSlab* tls, int class_idx) TinySlabMeta* meta = tls->meta; if (!meta || !tls->ss || tls->slab_base == NULL) return res; if (meta->class_idx != (uint8_t)class_idx) return res; - if (tls->slab_idx < 0 || tls->slab_idx >= ss_slabs_capacity(tls->ss)) return res; + if (tls->slab_idx >= ss_slabs_capacity(tls->ss)) return res; tiny_class_stats_on_tls_carve_attempt(class_idx); diff --git a/core/front/malloc_tiny_fast.h b/core/front/malloc_tiny_fast.h index 41127bad..9ba96cfd 100644 --- a/core/front/malloc_tiny_fast.h +++ b/core/front/malloc_tiny_fast.h @@ -229,6 +229,17 @@ static inline int free_tiny_fast(void* ptr) { // 4. BASE を計算して Unified Cache に push void* base = (void*)((char*)ptr - 1); tiny_front_free_stat_inc(class_idx); + + // C7 v3 fast classify: bypass classify_ptr/ss_map_lookup for clear hits + if (class_idx == 7 && + tiny_front_v3_enabled() && + tiny_ptr_fast_classify_enabled() && + small_heap_v3_c7_enabled() && + smallobject_hotbox_v3_can_own_c7(base)) { + so_free(7, base); + return 1; + } + tiny_route_kind_t route = tiny_route_for_class((uint8_t)class_idx); const int use_tiny_heap = tiny_route_is_heap_kind(route); const TinyFrontV3Snapshot* front_snap = diff --git a/core/front/tiny_heap_v2.h b/core/front/tiny_heap_v2.h index 43b5bee8..b26c4132 100644 --- a/core/front/tiny_heap_v2.h +++ b/core/front/tiny_heap_v2.h @@ -9,7 +9,21 @@ #include "../hakmem_tiny.h" #include "../box/tls_sll_box.h" +#include "../hakmem_env_cache.h" + +#ifndef TINY_FRONT_TLS_SLL_ENABLED +#define HAK_TINY_TLS_SLL_ENABLED_FALLBACK 1 +#else +#define HAK_TINY_TLS_SLL_ENABLED_FALLBACK TINY_FRONT_TLS_SLL_ENABLED +#endif + +#ifndef TINY_FRONT_HEAP_V2_ENABLED +#define HAK_TINY_HEAP_V2_ENABLED_FALLBACK tiny_heap_v2_enabled() +#else +#define HAK_TINY_HEAP_V2_ENABLED_FALLBACK TINY_FRONT_HEAP_V2_ENABLED +#endif #include +#include // Phase 13-B: Magazine capacity (same as Phase 13-A) #ifndef TINY_HEAP_V2_MAG_CAP @@ -34,6 +48,11 @@ typedef struct { // External TLS variables (defined in hakmem_tiny.c) extern __thread TinyHeapV2Mag g_tiny_heap_v2_mag[TINY_NUM_CLASSES]; extern __thread TinyHeapV2Stats g_tiny_heap_v2_stats[TINY_NUM_CLASSES]; +extern __thread int g_tls_heap_v2_initialized; + +// Backend refill helpers (implemented in Tiny refill path) +int sll_refill_small_from_ss(int class_idx, int max_take); +int sll_refill_batch_from_ss(int class_idx, int max_take); // Enable flag (cached) // ENV: HAKMEM_TINY_FRONT_V2 @@ -132,10 +151,128 @@ static inline int tiny_heap_v2_try_push(int class_idx, void* base) { return 1; // Success } -// Forward declaration: refill + alloc helper (implemented inline where included) -static inline int tiny_heap_v2_refill_mag(int class_idx); -static inline void* tiny_heap_v2_alloc_by_class(int class_idx); -static inline int tiny_heap_v2_stats_enabled(void); +// Stats gate (ENV cached) +static inline int tiny_heap_v2_stats_enabled(void) { + return HAK_ENV_TINY_HEAP_V2_STATS(); +} + +// TLS HeapV2 initialization barrier (ensures mag->top is zero on first use) +static inline void tiny_heap_v2_ensure_init(void) { + extern __thread int g_tls_heap_v2_initialized; + extern __thread TinyHeapV2Mag g_tiny_heap_v2_mag[]; + + if (__builtin_expect(!g_tls_heap_v2_initialized, 0)) { + for (int i = 0; i < TINY_NUM_CLASSES; i++) { + g_tiny_heap_v2_mag[i].top = 0; + } + g_tls_heap_v2_initialized = 1; + } +} + +// Magazine refill from TLS SLL/backend +static inline int tiny_heap_v2_refill_mag(int class_idx) { + // FIX: Ensure TLS is initialized before first magazine access + tiny_heap_v2_ensure_init(); + if (class_idx < 0 || class_idx > 3) return 0; + if (!tiny_heap_v2_class_enabled(class_idx)) return 0; + + // Phase 7-Step7: Use config macro for dead code elimination in PGO mode + if (!HAK_TINY_TLS_SLL_ENABLED_FALLBACK) return 0; + + TinyHeapV2Mag* mag = &g_tiny_heap_v2_mag[class_idx]; + const int cap = TINY_HEAP_V2_MAG_CAP; + int filled = 0; + + // FIX: Validate mag->top before use (prevent uninitialized TLS corruption) + if (mag->top < 0 || mag->top > cap) { + static __thread int s_reset_logged[TINY_NUM_CLASSES] = {0}; + if (!s_reset_logged[class_idx]) { + fprintf(stderr, "[HEAP_V2_REFILL] C%d mag->top=%d corrupted, reset to 0\n", + class_idx, mag->top); + s_reset_logged[class_idx] = 1; + } + mag->top = 0; + } + + // First, steal from TLS SLL if already available. + while (mag->top < cap) { + void* base = NULL; + if (!tls_sll_pop(class_idx, &base)) break; + mag->items[mag->top++] = base; + filled++; + } + + // If magazine is still empty, ask backend to refill SLL once, then steal again. + if (mag->top < cap && filled == 0) { +#if HAKMEM_TINY_P0_BATCH_REFILL + (void)sll_refill_batch_from_ss(class_idx, cap); +#else + (void)sll_refill_small_from_ss(class_idx, cap); +#endif + while (mag->top < cap) { + void* base = NULL; + if (!tls_sll_pop(class_idx, &base)) break; + mag->items[mag->top++] = base; + filled++; + } + } + + if (__builtin_expect(tiny_heap_v2_stats_enabled(), 0)) { + if (filled > 0) { + g_tiny_heap_v2_stats[class_idx].refill_calls++; + g_tiny_heap_v2_stats[class_idx].refill_blocks += (uint64_t)filled; + } + } + return filled; +} + +// Magazine pop (fast path) +static inline void* tiny_heap_v2_alloc_by_class(int class_idx) { + // FIX: Ensure TLS is initialized before first magazine access + tiny_heap_v2_ensure_init(); + if (class_idx < 0 || class_idx > 3) return NULL; + // Phase 7-Step8: Use config macro for dead code elimination in PGO mode + if (!HAK_TINY_HEAP_V2_ENABLED_FALLBACK) return NULL; + if (!tiny_heap_v2_class_enabled(class_idx)) return NULL; + + TinyHeapV2Mag* mag = &g_tiny_heap_v2_mag[class_idx]; + + // Hit: magazine has entries + if (__builtin_expect(mag->top > 0, 1)) { + // FIX: Add underflow protection before array access + const int cap = TINY_HEAP_V2_MAG_CAP; + if (mag->top > cap || mag->top < 0) { + static __thread int s_reset_logged[TINY_NUM_CLASSES] = {0}; + if (!s_reset_logged[class_idx]) { + fprintf(stderr, "[HEAP_V2_ALLOC] C%d mag->top=%d corrupted, reset to 0\n", + class_idx, mag->top); + s_reset_logged[class_idx] = 1; + } + mag->top = 0; + return NULL; // Fall through to refill path + } + if (__builtin_expect(tiny_heap_v2_stats_enabled(), 0)) { + g_tiny_heap_v2_stats[class_idx].alloc_calls++; + g_tiny_heap_v2_stats[class_idx].mag_hits++; + } + return mag->items[--mag->top]; + } + + // Miss: try single refill from SLL/backend + int filled = tiny_heap_v2_refill_mag(class_idx); + if (filled > 0 && mag->top > 0) { + if (__builtin_expect(tiny_heap_v2_stats_enabled(), 0)) { + g_tiny_heap_v2_stats[class_idx].alloc_calls++; + g_tiny_heap_v2_stats[class_idx].mag_hits++; + } + return mag->items[--mag->top]; + } + + if (__builtin_expect(tiny_heap_v2_stats_enabled(), 0)) { + g_tiny_heap_v2_stats[class_idx].backend_oom++; + } + return NULL; +} // Print statistics (called at program exit if HAKMEM_TINY_HEAP_V2_STATS=1, impl in hakmem_tiny.c) void tiny_heap_v2_print_stats(void); diff --git a/core/front/tiny_unified_cache.c b/core/front/tiny_unified_cache.c index e2ab7822..4113fb9c 100644 --- a/core/front/tiny_unified_cache.c +++ b/core/front/tiny_unified_cache.c @@ -379,7 +379,7 @@ static inline int unified_refill_validate_base(int class_idx, const char* stage) { #if HAKMEM_BUILD_RELEASE - (void)class_idx; (void)tls; (void)base; (void)stage; + (void)class_idx; (void)tls; (void)base; (void)stage; (void)meta; return 1; #else if (!base) { diff --git a/core/hakmem.c b/core/hakmem.c index 6a0ea584..13a3656f 100644 --- a/core/hakmem.c +++ b/core/hakmem.c @@ -35,6 +35,8 @@ #include #include #include +#include +#include #include // NEW Phase 6.5: For atomic tick counter #include // Phase 6.15: Threading primitives (recursion guard only) #include // Yield during init wait @@ -59,7 +61,8 @@ static void hakmem_sigsegv_handler_early(int sig) { (void)sig; const char* msg = "\n[HAKMEM] Segmentation Fault (Early Init)\n"; - (void)write(2, msg, 42); + ssize_t written = write(2, msg, 42); + (void)written; abort(); } @@ -77,8 +80,6 @@ _Atomic int g_cached_strategy_id = 0; // Cached strategy ID (updated every wind uint64_t g_evo_sample_mask = 0; // 0 = disabled (default), (1<alloc); HAKMEM_LOG(" cache: 0x%08x\n", fs->cache); diff --git a/core/hakmem_env_cache.h b/core/hakmem_env_cache.h index bb637ad6..47477a4d 100644 --- a/core/hakmem_env_cache.h +++ b/core/hakmem_env_cache.h @@ -94,6 +94,9 @@ typedef struct { // ===== Cold Path: Superslab Madvise (1 variable) ===== int ss_madvise_strict; // HAKMEM_SS_MADVISE_STRICT (default: 1) + // ===== Pool (mid) Zero Mode (1 variable) ===== + int pool_zero_mode; // HAKMEM_POOL_ZERO_MODE (default: FULL=0) + } HakEnvCache; // Global cache instance (initialized once at startup) @@ -299,6 +302,22 @@ static inline void hakmem_env_cache_init(void) { g_hak_env_cache.ss_madvise_strict = (e && *e && *e == '0') ? 0 : 1; } + // ===== Pool (mid) Zero Mode ===== + { + const char* e = getenv("HAKMEM_POOL_ZERO_MODE"); + if (e && *e) { + if (strcmp(e, "header") == 0) { + g_hak_env_cache.pool_zero_mode = 1; // header-only zero + } else if (strcmp(e, "off") == 0 || strcmp(e, "none") == 0 || strcmp(e, "0") == 0) { + g_hak_env_cache.pool_zero_mode = 2; // zero off + } else { + g_hak_env_cache.pool_zero_mode = 0; // unknown -> default FULL + } + } else { + g_hak_env_cache.pool_zero_mode = 0; // default FULL + } + } + #if !HAKMEM_BUILD_RELEASE // Debug: Print cache summary (stderr only) if (!g_hak_env_cache.quiet) { @@ -374,4 +393,7 @@ static inline void hakmem_env_cache_init(void) { // Cold path: Superslab Madvise #define HAK_ENV_SS_MADVISE_STRICT() (g_hak_env_cache.ss_madvise_strict) +// Pool (mid) Zero Mode +#define HAK_ENV_POOL_ZERO_MODE() (g_hak_env_cache.pool_zero_mode) + #endif // HAKMEM_ENV_CACHE_H diff --git a/core/hakmem_internal.h b/core/hakmem_internal.h index 1f6d1da9..9b207fba 100644 --- a/core/hakmem_internal.h +++ b/core/hakmem_internal.h @@ -342,6 +342,7 @@ static inline void* hak_alloc_mmap_impl(size_t size) { // // Migration: All callers should use hak_super_lookup() instead static inline int hak_is_memory_readable(void* addr) { + (void)addr; // Phase 9: Removed mincore() - assume valid (registry ensures safety) // Callers should use hak_super_lookup() for validation return 1; // Always return true (trust internal metadata) diff --git a/core/hakmem_learner.c b/core/hakmem_learner.c index cb981167..9bc82e59 100644 --- a/core/hakmem_learner.c +++ b/core/hakmem_learner.c @@ -64,9 +64,7 @@ // HAKMEM_LEARN=1 HAKMEM_DYN1_AUTO=1 HAKMEM_CAP_MID_DYN1=64 ./app // // # W_MAX学習(Canary方式で安全に探索) -// HAKMEM_LEARN=1 HAKMEM_WMAX_LEARN=1 \ -// HAKMEM_WMAX_CANDIDATES_MID=1.4,1.6,1.8 \ -// HAKMEM_WMAX_CANDIDATES_LARGE=1.3,1.6,2.0 ./app +// HAKMEM_LEARN=1 HAKMEM_WMAX_LEARN=1 HAKMEM_WMAX_CANDIDATES_MID=1.4,1.6,1.8 HAKMEM_WMAX_CANDIDATES_LARGE=1.3,1.6,2.0 ./app // // 注意事項: // - 学習モードは高負荷ワークロードで効果的 @@ -356,8 +354,8 @@ static void* learner_main(void* arg) { if (sum > budget_mid) { while (sum > budget_mid) { // find min need with cap>min_mid - int best_k = -1; double best_need = 1e9; int best_cap=0; - for (int k=0;k budget_lg) { int best=-1; double best_need=1e9; for (int i=0;ilarge_cap[i] <= min_lg) continue; if (need_lg[i] < best_need){ best_need=need_lg[i]; best=i; } } - if (best<0) break; int nv=np->large_cap[best]-step_lg; if (nvlarge_cap[best]=nv; sum=0; for (int i=0;ilarge_cap[i]; + if (best<0) break; + int nv=np->large_cap[best]-step_lg; if (nvlarge_cap[best]=nv; sum=0; for (int i=0;ilarge_cap[i]; } } else if (wf_enabled && sum < budget_lg) { while (sum < budget_lg) { int best=-1; double best_need=-1e9; for (int i=0;i best_need){ best_need=need_lg[i]; best=i; } } - if (best<0) break; np->large_cap[best]+=step_lg; sum += step_lg; + if (best<0) break; + np->large_cap[best]+=step_lg; sum += step_lg; } } } diff --git a/core/hakmem_phase7_config.h b/core/hakmem_phase7_config.h index 86ea5b2d..72eaf66d 100644 --- a/core/hakmem_phase7_config.h +++ b/core/hakmem_phase7_config.h @@ -124,14 +124,14 @@ // make phase7-bench // // 3. Phase 7 完全ビルド: -// make HEADER_CLASSIDX=1 AGGRESSIVE_INLINE=1 PREWARM_TLS=1 \ +// make HEADER_CLASSIDX=1 AGGRESSIVE_INLINE=1 PREWARM_TLS=1 // bench_random_mixed_hakmem larson_hakmem // // 4. PGO ビルド (Task 4): // make PROFILE_GEN=1 bench_random_mixed_hakmem // ./bench_random_mixed_hakmem 100000 128 1234567 # プロファイル収集 // make clean -// make PROFILE_USE=1 HEADER_CLASSIDX=1 AGGRESSIVE_INLINE=1 \ +// make PROFILE_USE=1 HEADER_CLASSIDX=1 AGGRESSIVE_INLINE=1 // bench_random_mixed_hakmem #endif // HAKMEM_PHASE7_CONFIG_H diff --git a/core/hakmem_pool.c b/core/hakmem_pool.c index 3e305bcc..62d0d8ac 100644 --- a/core/hakmem_pool.c +++ b/core/hakmem_pool.c @@ -49,6 +49,7 @@ #include "box/pool_hotbox_v2_header_box.h" #include "hakmem_syscall.h" // Box 3 syscall layer (bypasses LD_PRELOAD) #include "box/pool_hotbox_v2_box.h" +#include "box/pool_zero_mode_box.h" // Zeroing policy (env cached) #include #include #include @@ -209,22 +210,6 @@ static inline MidPage* mf2_addr_to_page(void* addr) { // Step 3: Direct lookup (no hash collision handling needed with 64K entries) MidPage* page = g_mf2_page_registry.pages[idx]; - // ALIGNMENT VERIFICATION (Step 3) - Sample first 100 lookups - static _Atomic int lookup_count = 0; - // DEBUG: Disabled for performance - // int count = atomic_fetch_add_explicit(&lookup_count, 1, memory_order_relaxed); - // if (count < 100) { - // int found = (page != NULL); - // int match = (page && page->base == page_base); - // fprintf(stderr, "[LOOKUP %d] addr=%p → page_base=%p → idx=%zu → found=%s", - // count, addr, page_base, idx, found ? "YES" : "NO"); - // if (page) { - // fprintf(stderr, ", page->base=%p, match=%s", - // page->base, match ? "YES" : "NO"); - // } - // fprintf(stderr, "\n"); - // } - // Validation: Ensure page base matches (handles potential collisions) if (page && page->base == page_base) { return page; @@ -350,9 +335,12 @@ static MidPage* mf2_alloc_new_page(int class_idx) { page_base, ((uintptr_t)page_base & 0xFFFF)); } - // Zero-fill (required for posix_memalign) - // Note: This adds ~15μs overhead, but is necessary for correctness - memset(page_base, 0, POOL_PAGE_SIZE); + PoolZeroMode zero_mode = hak_pool_zero_mode(); + // Zero-fill (default) or relax based on ENV gate (POOL_ZERO_MODE_HEADER/OFF). + // mmap() already returns zeroed pages; this gate controls additional zeroing overhead. + if (zero_mode == POOL_ZERO_MODE_FULL) { + memset(page_base, 0, POOL_PAGE_SIZE); + } // Step 2: Allocate MidPage descriptor MidPage* page = (MidPage*)hkm_libc_calloc(1, sizeof(MidPage)); @@ -386,6 +374,10 @@ static MidPage* mf2_alloc_new_page(int class_idx) { char* block_addr = (char*)page_base + (i * block_size); PoolBlock* block = (PoolBlock*)block_addr; + if (zero_mode == POOL_ZERO_MODE_HEADER) { + memset(block, 0, HEADER_SIZE); + } + block->next = NULL; if (freelist_head == NULL) { diff --git a/core/hakmem_shared_pool.c b/core/hakmem_shared_pool.c index b135c0d1..a683b0bc 100644 --- a/core/hakmem_shared_pool.c +++ b/core/hakmem_shared_pool.c @@ -305,6 +305,7 @@ shared_pool_init(void) // Find first unused slot in SharedSSMeta // P0-5: Uses atomic load for state check // Returns: slot_idx on success, -1 if no unused slots +static int sp_slot_find_unused(SharedSSMeta* meta) __attribute__((unused)); static int sp_slot_find_unused(SharedSSMeta* meta) { if (!meta) return -1; @@ -484,6 +485,7 @@ SharedSSMeta* sp_meta_find_or_create(SuperSlab* ss) { // Find UNUSED slot and claim it (UNUSED → ACTIVE) using lock-free CAS // Returns: slot_idx on success, -1 if no UNUSED slots int sp_slot_claim_lockfree(SharedSSMeta* meta, int class_idx) { + (void)class_idx; if (!meta) return -1; // Optimization: Quick check if any unused slots exist? diff --git a/core/hakmem_shared_pool_release.c b/core/hakmem_shared_pool_release.c index 26150f19..04a661ba 100644 --- a/core/hakmem_shared_pool_release.c +++ b/core/hakmem_shared_pool_release.c @@ -87,6 +87,7 @@ shared_pool_release_slab(SuperSlab* ss, int slab_idx) #else static const int dbg = 0; #endif + (void)dbg; // P0 instrumentation: count lock acquisitions lock_stats_init(); diff --git a/core/hakmem_super_registry.c b/core/hakmem_super_registry.c index d967d798..3406ef53 100644 --- a/core/hakmem_super_registry.c +++ b/core/hakmem_super_registry.c @@ -150,6 +150,7 @@ void hak_super_unregister(uintptr_t base) { #else static const int dbg_once = 0; #endif + (void)dbg_once; if (!g_super_reg_initialized) return; pthread_mutex_lock(&g_super_reg_lock); @@ -365,6 +366,7 @@ static int ss_lru_evict_one(void) { // Unregister and free uintptr_t base = (uintptr_t)victim; + (void)base; // Debug logging for LRU EVICT if (dbg == 1) { diff --git a/core/hakmem_tiny.c b/core/hakmem_tiny.c index e3694b7e..1d147cf1 100644 --- a/core/hakmem_tiny.c +++ b/core/hakmem_tiny.c @@ -37,6 +37,7 @@ #include "box/super_reg_box.h" #include "tiny_region_id.h" #include "tiny_debug_api.h" +#include "tiny_destructors.h" #include "hakmem_tiny_tls_list.h" #include "hakmem_tiny_remote_target.h" // Phase 2C-1: Remote target queue #include "hakmem_tiny_bg_spill.h" // Phase 2C-2: Background spill queue @@ -72,16 +73,6 @@ static int g_tiny_front_v3_lut_ready = 0; // Forward decls (to keep deps light in this TU) int unified_cache_enabled(void); -static int tiny_heap_stats_dump_enabled(void) { - static int g = -1; - if (__builtin_expect(g == -1, 0)) { - const char* eh = getenv("HAKMEM_TINY_HEAP_STATS_DUMP"); - const char* e = getenv("HAKMEM_TINY_C7_HEAP_STATS_DUMP"); - g = ((eh && *eh && *eh != '0') || (e && *e && *e != '0')) ? 1 : 0; - } - return g; -} - void tiny_front_v3_snapshot_init(void) { if (g_tiny_front_v3_snapshot_ready) { return; @@ -135,123 +126,31 @@ const TinyFrontV3SizeClassEntry* tiny_front_v3_lut_lookup(size_t size) { return &g_tiny_front_v3_lut[size]; } -__attribute__((destructor)) -static void tiny_heap_stats_dump(void) { - if (!tiny_heap_stats_enabled() || !tiny_heap_stats_dump_enabled()) { - return; - } - for (int cls = 0; cls < TINY_NUM_CLASSES; cls++) { - TinyHeapClassStats snap = { - .alloc_fast_current = atomic_load_explicit(&g_tiny_heap_stats[cls].alloc_fast_current, memory_order_relaxed), - .alloc_slow_prepare = atomic_load_explicit(&g_tiny_heap_stats[cls].alloc_slow_prepare, memory_order_relaxed), - .free_fast_local = atomic_load_explicit(&g_tiny_heap_stats[cls].free_fast_local, memory_order_relaxed), - .free_slow_fallback = atomic_load_explicit(&g_tiny_heap_stats[cls].free_slow_fallback, memory_order_relaxed), - .alloc_prepare_fail = atomic_load_explicit(&g_tiny_heap_stats[cls].alloc_prepare_fail, memory_order_relaxed), - .alloc_fail = atomic_load_explicit(&g_tiny_heap_stats[cls].alloc_fail, memory_order_relaxed), - }; - if (snap.alloc_fast_current == 0 && snap.alloc_slow_prepare == 0 && - snap.free_fast_local == 0 && snap.free_slow_fallback == 0 && - snap.alloc_prepare_fail == 0 && snap.alloc_fail == 0) { - continue; - } - fprintf(stderr, - "[HEAP_STATS cls=%d] alloc_fast_current=%llu alloc_slow_prepare=%llu free_fast_local=%llu free_slow_fallback=%llu alloc_prepare_fail=%llu alloc_fail=%llu\n", - cls, - (unsigned long long)snap.alloc_fast_current, - (unsigned long long)snap.alloc_slow_prepare, - (unsigned long long)snap.free_fast_local, - (unsigned long long)snap.free_slow_fallback, - (unsigned long long)snap.alloc_prepare_fail, - (unsigned long long)snap.alloc_fail); - } - TinyC7PageStats ps = { - .prepare_calls = atomic_load_explicit(&g_c7_page_stats.prepare_calls, memory_order_relaxed), - .prepare_with_current_null = atomic_load_explicit(&g_c7_page_stats.prepare_with_current_null, memory_order_relaxed), - .prepare_from_partial = atomic_load_explicit(&g_c7_page_stats.prepare_from_partial, memory_order_relaxed), - .current_set_from_free = atomic_load_explicit(&g_c7_page_stats.current_set_from_free, memory_order_relaxed), - .current_dropped_to_partial = atomic_load_explicit(&g_c7_page_stats.current_dropped_to_partial, memory_order_relaxed), - }; - if (ps.prepare_calls || ps.prepare_with_current_null || ps.prepare_from_partial || - ps.current_set_from_free || ps.current_dropped_to_partial) { - fprintf(stderr, - "[C7_PAGE_STATS] prepare_calls=%llu prepare_with_current_null=%llu prepare_from_partial=%llu current_set_from_free=%llu current_dropped_to_partial=%llu\n", - (unsigned long long)ps.prepare_calls, - (unsigned long long)ps.prepare_with_current_null, - (unsigned long long)ps.prepare_from_partial, - (unsigned long long)ps.current_set_from_free, - (unsigned long long)ps.current_dropped_to_partial); - fflush(stderr); - } -} - -__attribute__((destructor)) -static void tiny_front_class_stats_dump(void) { - if (!tiny_front_class_stats_dump_enabled()) { - return; - } - for (int cls = 0; cls < TINY_NUM_CLASSES; cls++) { - uint64_t a = atomic_load_explicit(&g_tiny_front_alloc_class[cls], memory_order_relaxed); - uint64_t f = atomic_load_explicit(&g_tiny_front_free_class[cls], memory_order_relaxed); - if (a == 0 && f == 0) { - continue; - } - fprintf(stderr, "[FRONT_CLASS cls=%d] alloc=%llu free=%llu\n", - cls, (unsigned long long)a, (unsigned long long)f); - } -} - -__attribute__((destructor)) -static void tiny_c7_delta_debug_destructor(void) { - if (tiny_c7_meta_light_enabled() && tiny_c7_delta_debug_enabled()) { - tiny_c7_heap_debug_dump_deltas(); - } - if (tiny_heap_meta_light_enabled_for_class(6) && tiny_c6_delta_debug_enabled()) { - tiny_c6_heap_debug_dump_deltas(); - } -} - // ============================================================================= // TinyHotHeap v2 (Phase30/31 wiring). Currently C7-only thin wrapper. // NOTE: Phase34/35 時点では v2 は C7-only でも v1 より遅く、mixed では大きな回帰がある。 // 実験用フラグを明示 ON にしたときだけ使う前提で、デフォルトは v1 を推奨。 // ============================================================================= -static inline int tiny_hotheap_v2_stats_enabled(void) { - static int g = -1; - if (__builtin_expect(g == -1, 0)) { - const char* e = getenv("HAKMEM_TINY_HOTHEAP_V2_STATS"); - g = (e && *e && *e != '0') ? 1 : 0; - } - return g; -} +_Atomic uint64_t g_tiny_hotheap_v2_route_hits[TINY_HOTHEAP_MAX_CLASSES] = {0}; +_Atomic uint64_t g_tiny_hotheap_v2_alloc_calls[TINY_HOTHEAP_MAX_CLASSES] = {0}; +_Atomic uint64_t g_tiny_hotheap_v2_alloc_fast[TINY_HOTHEAP_MAX_CLASSES] = {0}; +_Atomic uint64_t g_tiny_hotheap_v2_alloc_lease[TINY_HOTHEAP_MAX_CLASSES] = {0}; +_Atomic uint64_t g_tiny_hotheap_v2_alloc_fallback_v1[TINY_HOTHEAP_MAX_CLASSES] = {0}; +_Atomic uint64_t g_tiny_hotheap_v2_alloc_refill[TINY_HOTHEAP_MAX_CLASSES] = {0}; +_Atomic uint64_t g_tiny_hotheap_v2_refill_with_current[TINY_HOTHEAP_MAX_CLASSES] = {0}; +_Atomic uint64_t g_tiny_hotheap_v2_refill_with_partial[TINY_HOTHEAP_MAX_CLASSES] = {0}; +_Atomic uint64_t g_tiny_hotheap_v2_alloc_route_fb[TINY_HOTHEAP_MAX_CLASSES] = {0}; +_Atomic uint64_t g_tiny_hotheap_v2_free_calls[TINY_HOTHEAP_MAX_CLASSES] = {0}; +_Atomic uint64_t g_tiny_hotheap_v2_free_fast[TINY_HOTHEAP_MAX_CLASSES] = {0}; +_Atomic uint64_t g_tiny_hotheap_v2_free_fallback_v1[TINY_HOTHEAP_MAX_CLASSES] = {0}; +_Atomic uint64_t g_tiny_hotheap_v2_cold_refill_fail[TINY_HOTHEAP_MAX_CLASSES] = {0}; +_Atomic uint64_t g_tiny_hotheap_v2_cold_retire_calls[TINY_HOTHEAP_MAX_CLASSES] = {0}; +_Atomic uint64_t g_tiny_hotheap_v2_retire_calls_v2[TINY_HOTHEAP_MAX_CLASSES] = {0}; +_Atomic uint64_t g_tiny_hotheap_v2_partial_pushes[TINY_HOTHEAP_MAX_CLASSES] = {0}; +_Atomic uint64_t g_tiny_hotheap_v2_partial_pops[TINY_HOTHEAP_MAX_CLASSES] = {0}; +_Atomic uint64_t g_tiny_hotheap_v2_partial_peak[TINY_HOTHEAP_MAX_CLASSES] = {0}; -static _Atomic uint64_t g_tiny_hotheap_v2_route_hits[TINY_HOTHEAP_MAX_CLASSES] = {0}; -static _Atomic uint64_t g_tiny_hotheap_v2_alloc_calls[TINY_HOTHEAP_MAX_CLASSES] = {0}; -static _Atomic uint64_t g_tiny_hotheap_v2_alloc_fast[TINY_HOTHEAP_MAX_CLASSES] = {0}; -static _Atomic uint64_t g_tiny_hotheap_v2_alloc_lease[TINY_HOTHEAP_MAX_CLASSES] = {0}; -static _Atomic uint64_t g_tiny_hotheap_v2_alloc_fallback_v1[TINY_HOTHEAP_MAX_CLASSES] = {0}; -static _Atomic uint64_t g_tiny_hotheap_v2_alloc_refill[TINY_HOTHEAP_MAX_CLASSES] = {0}; -static _Atomic uint64_t g_tiny_hotheap_v2_refill_with_current[TINY_HOTHEAP_MAX_CLASSES] = {0}; -static _Atomic uint64_t g_tiny_hotheap_v2_refill_with_partial[TINY_HOTHEAP_MAX_CLASSES] = {0}; -static _Atomic uint64_t g_tiny_hotheap_v2_alloc_route_fb[TINY_HOTHEAP_MAX_CLASSES] = {0}; -static _Atomic uint64_t g_tiny_hotheap_v2_free_calls[TINY_HOTHEAP_MAX_CLASSES] = {0}; -static _Atomic uint64_t g_tiny_hotheap_v2_free_fast[TINY_HOTHEAP_MAX_CLASSES] = {0}; -static _Atomic uint64_t g_tiny_hotheap_v2_free_fallback_v1[TINY_HOTHEAP_MAX_CLASSES] = {0}; -static _Atomic uint64_t g_tiny_hotheap_v2_cold_refill_fail[TINY_HOTHEAP_MAX_CLASSES] = {0}; -static _Atomic uint64_t g_tiny_hotheap_v2_cold_retire_calls[TINY_HOTHEAP_MAX_CLASSES] = {0}; -static _Atomic uint64_t g_tiny_hotheap_v2_retire_calls_v2[TINY_HOTHEAP_MAX_CLASSES] = {0}; -static _Atomic uint64_t g_tiny_hotheap_v2_partial_pushes[TINY_HOTHEAP_MAX_CLASSES] = {0}; -static _Atomic uint64_t g_tiny_hotheap_v2_partial_pops[TINY_HOTHEAP_MAX_CLASSES] = {0}; -static _Atomic uint64_t g_tiny_hotheap_v2_partial_peak[TINY_HOTHEAP_MAX_CLASSES] = {0}; - -typedef struct { - _Atomic uint64_t prepare_calls; - _Atomic uint64_t prepare_with_current_null; - _Atomic uint64_t prepare_from_partial; - _Atomic uint64_t free_made_current; - _Atomic uint64_t page_retired; -} TinyHotHeapV2PageStats; - -static TinyHotHeapV2PageStats g_tiny_hotheap_v2_page_stats[TINY_HOTHEAP_MAX_CLASSES] = {0}; +TinyHotHeapV2PageStats g_tiny_hotheap_v2_page_stats[TINY_HOTHEAP_MAX_CLASSES] = {0}; static void tiny_hotheap_v2_page_retire_slow(tiny_hotheap_ctx_v2* ctx, uint8_t class_idx, tiny_hotheap_page_v2* page); @@ -588,73 +487,6 @@ static inline void* tiny_hotheap_v2_try_pop(tiny_hotheap_class_v2* hc, return tiny_region_id_write_header(block, class_idx); } -__attribute__((destructor)) -static void tiny_hotheap_v2_stats_dump(void) { - if (!tiny_hotheap_v2_stats_enabled()) { - return; - } - for (uint8_t ci = 0; ci < TINY_HOTHEAP_MAX_CLASSES; ci++) { - uint64_t alloc_calls = atomic_load_explicit(&g_tiny_hotheap_v2_alloc_calls[ci], memory_order_relaxed); - uint64_t route_hits = atomic_load_explicit(&g_tiny_hotheap_v2_route_hits[ci], memory_order_relaxed); - uint64_t alloc_fast = atomic_load_explicit(&g_tiny_hotheap_v2_alloc_fast[ci], memory_order_relaxed); - uint64_t alloc_lease = atomic_load_explicit(&g_tiny_hotheap_v2_alloc_lease[ci], memory_order_relaxed); - uint64_t alloc_fb = atomic_load_explicit(&g_tiny_hotheap_v2_alloc_fallback_v1[ci], memory_order_relaxed); - uint64_t free_calls = atomic_load_explicit(&g_tiny_hotheap_v2_free_calls[ci], memory_order_relaxed); - uint64_t free_fast = atomic_load_explicit(&g_tiny_hotheap_v2_free_fast[ci], memory_order_relaxed); - uint64_t free_fb = atomic_load_explicit(&g_tiny_hotheap_v2_free_fallback_v1[ci], memory_order_relaxed); - uint64_t cold_refill_fail = atomic_load_explicit(&g_tiny_hotheap_v2_cold_refill_fail[ci], memory_order_relaxed); - uint64_t cold_retire_calls = atomic_load_explicit(&g_tiny_hotheap_v2_cold_retire_calls[ci], memory_order_relaxed); - uint64_t retire_calls_v2 = atomic_load_explicit(&g_tiny_hotheap_v2_retire_calls_v2[ci], memory_order_relaxed); - uint64_t partial_pushes = atomic_load_explicit(&g_tiny_hotheap_v2_partial_pushes[ci], memory_order_relaxed); - uint64_t partial_pops = atomic_load_explicit(&g_tiny_hotheap_v2_partial_pops[ci], memory_order_relaxed); - uint64_t partial_peak = atomic_load_explicit(&g_tiny_hotheap_v2_partial_peak[ci], memory_order_relaxed); - uint64_t refill_with_cur = atomic_load_explicit(&g_tiny_hotheap_v2_refill_with_current[ci], memory_order_relaxed); - uint64_t refill_with_partial = atomic_load_explicit(&g_tiny_hotheap_v2_refill_with_partial[ci], memory_order_relaxed); - - TinyHotHeapV2PageStats ps = { - .prepare_calls = atomic_load_explicit(&g_tiny_hotheap_v2_page_stats[ci].prepare_calls, memory_order_relaxed), - .prepare_with_current_null = atomic_load_explicit(&g_tiny_hotheap_v2_page_stats[ci].prepare_with_current_null, memory_order_relaxed), - .prepare_from_partial = atomic_load_explicit(&g_tiny_hotheap_v2_page_stats[ci].prepare_from_partial, memory_order_relaxed), - .free_made_current = atomic_load_explicit(&g_tiny_hotheap_v2_page_stats[ci].free_made_current, memory_order_relaxed), - .page_retired = atomic_load_explicit(&g_tiny_hotheap_v2_page_stats[ci].page_retired, memory_order_relaxed), - }; - - if (!(alloc_calls || alloc_fast || alloc_lease || alloc_fb || free_calls || free_fast || free_fb || - ps.prepare_calls || ps.prepare_with_current_null || ps.prepare_from_partial || - ps.free_made_current || ps.page_retired || retire_calls_v2 || partial_pushes || partial_pops || partial_peak)) { - continue; - } - - tiny_route_kind_t route_kind = tiny_route_for_class(ci); - fprintf(stderr, - "[HOTHEAP_V2_STATS cls=%u route=%d] route_hits=%llu alloc_calls=%llu alloc_fast=%llu alloc_lease=%llu alloc_refill=%llu refill_cur=%llu refill_partial=%llu alloc_fb_v1=%llu alloc_route_fb=%llu cold_refill_fail=%llu cold_retire_calls=%llu retire_v2=%llu free_calls=%llu free_fast=%llu free_fb_v1=%llu prep_calls=%llu prep_null=%llu prep_from_partial=%llu free_made_current=%llu page_retired=%llu partial_push=%llu partial_pop=%llu partial_peak=%llu\n", - (unsigned)ci, - (int)route_kind, - (unsigned long long)route_hits, - (unsigned long long)alloc_calls, - (unsigned long long)alloc_fast, - (unsigned long long)alloc_lease, - (unsigned long long)atomic_load_explicit(&g_tiny_hotheap_v2_alloc_refill[ci], memory_order_relaxed), - (unsigned long long)refill_with_cur, - (unsigned long long)refill_with_partial, - (unsigned long long)alloc_fb, - (unsigned long long)atomic_load_explicit(&g_tiny_hotheap_v2_alloc_route_fb[ci], memory_order_relaxed), - (unsigned long long)cold_refill_fail, - (unsigned long long)cold_retire_calls, - (unsigned long long)retire_calls_v2, - (unsigned long long)free_calls, - (unsigned long long)free_fast, - (unsigned long long)free_fb, - (unsigned long long)ps.prepare_calls, - (unsigned long long)ps.prepare_with_current_null, - (unsigned long long)ps.prepare_from_partial, - (unsigned long long)ps.free_made_current, - (unsigned long long)ps.page_retired, - (unsigned long long)partial_pushes, - (unsigned long long)partial_pops, - (unsigned long long)partial_peak); - } -} tiny_hotheap_ctx_v2* tiny_hotheap_v2_tls_get(void) { tiny_hotheap_ctx_v2* ctx = g_tiny_hotheap_ctx_v2; if (__builtin_expect(ctx == NULL, 0)) { @@ -890,7 +722,6 @@ static inline int sll_refill_small_from_ss(int class_idx, int max_take); #endif #endif static inline void hak_tiny_free_superslab(void* ptr, SuperSlab* ss); -static void* __attribute__((cold, noinline)) tiny_slow_alloc_fast(int class_idx); static inline void tiny_remote_drain_owner(struct TinySlab* slab); static void tiny_remote_drain_locked(struct TinySlab* slab); // Ultra-fast try-only variant: attempt a direct SuperSlab bump/freelist pop @@ -944,9 +775,9 @@ SuperSlab* adopt_gate_try(int class_idx, TinyTLSSlab* tls) { } int scan_limit = tiny_reg_scan_max(); if (scan_limit > reg_size) scan_limit = reg_size; - uint32_t self_tid = tiny_self_u32(); // Local helper (mirror adopt_bind_if_safe) to avoid including alloc inline here auto int adopt_bind_if_safe_local(TinyTLSSlab* tls_l, SuperSlab* ss, int slab_idx, int class_idx_l) { + (void)class_idx_l; uint32_t self_tid = tiny_self_u32(); SlabHandle h = slab_try_acquire(ss, slab_idx, self_tid); if (!slab_is_valid(&h)) return 0; @@ -1011,14 +842,6 @@ static inline int fastcache_push(int class_idx, hak_base_ptr_t ptr); // 88 lines (lines 407-494) -// ============================================================================ -// Legacy Slow Allocation Path - ARCHIVED -// ============================================================================ -// Note: tiny_slow_alloc_fast() and related legacy slow path implementation -// have been moved to archive/hakmem_tiny_legacy_slow_box.inc and are no -// longer compiled. The current slow path uses Box化された hak_tiny_alloc_slow(). - - // ============================================================================ // EXTRACTED TO hakmem_tiny_refill.inc.h (Phase 2D-1) // ============================================================================ @@ -1391,6 +1214,9 @@ extern __thread int g_tls_in_wrapper; // Phase 2D-4 (FINAL): Slab management functions (142 lines total) #include "hakmem_tiny_slab_mgmt.inc" +// Size→class routing for >=1024B (env: HAKMEM_TINY_ALLOC_1024_METRIC) +_Atomic uint64_t g_tiny_alloc_ge1024[TINY_NUM_CLASSES] = {0}; + // Tiny Heap v2 stats dump (opt-in) void tiny_heap_v2_print_stats(void) { // Priority-2: Use cached ENV @@ -1412,47 +1238,6 @@ void tiny_heap_v2_print_stats(void) { } } -static void tiny_heap_v2_stats_atexit(void) __attribute__((destructor)); -static void tiny_heap_v2_stats_atexit(void) { - tiny_heap_v2_print_stats(); -} - -// Size→class routing for >=1024B (env: HAKMEM_TINY_ALLOC_1024_METRIC) -_Atomic uint64_t g_tiny_alloc_ge1024[TINY_NUM_CLASSES] = {0}; -static void tiny_alloc_1024_diag_atexit(void) __attribute__((destructor)); -static void tiny_alloc_1024_diag_atexit(void) { - // Priority-2: Use cached ENV - if (!HAK_ENV_TINY_ALLOC_1024_METRIC()) return; - fprintf(stderr, "\n[ALLOC_GE1024] per-class counts (size>=1024)\n"); - for (int cls = 0; cls < TINY_NUM_CLASSES; cls++) { - uint64_t v = atomic_load_explicit(&g_tiny_alloc_ge1024[cls], memory_order_relaxed); - if (v) { - fprintf(stderr, " C%d=%llu", cls, (unsigned long long)v); - } - } - fprintf(stderr, "\n"); -} - -// TLS SLL pointer diagnostics (optional) -extern _Atomic uint64_t g_tls_sll_invalid_head[TINY_NUM_CLASSES]; -extern _Atomic uint64_t g_tls_sll_invalid_push[TINY_NUM_CLASSES]; -static void tiny_tls_sll_diag_atexit(void) __attribute__((destructor)); -static void tiny_tls_sll_diag_atexit(void) { -#if !HAKMEM_BUILD_RELEASE - // Priority-2: Use cached ENV - if (!HAK_ENV_TINY_SLL_DIAG()) return; - fprintf(stderr, "\n[TLS_SLL_DIAG] invalid head/push counts per class\n"); - for (int cls = 0; cls < TINY_NUM_CLASSES; cls++) { - uint64_t ih = atomic_load_explicit(&g_tls_sll_invalid_head[cls], memory_order_relaxed); - uint64_t ip = atomic_load_explicit(&g_tls_sll_invalid_push[cls], memory_order_relaxed); - if (ih || ip) { - fprintf(stderr, " C%d: invalid_head=%llu invalid_push=%llu\n", - cls, (unsigned long long)ih, (unsigned long long)ip); - } - } -#endif -} - // ============================================================================ // Performance Measurement: TLS SLL Statistics Print Function diff --git a/core/hakmem_tiny_ace_guard_box.inc b/core/hakmem_tiny_ace_guard_box.inc index ba729f45..8a2624a6 100644 --- a/core/hakmem_tiny_ace_guard_box.inc +++ b/core/hakmem_tiny_ace_guard_box.inc @@ -83,7 +83,6 @@ void tiny_guard_on_alloc(int cls, void* base, void* user, size_t stride) { if (!tiny_guard_enabled_runtime() || cls != g_tiny_guard_class) return; if (g_tiny_guard_seen++ >= g_tiny_guard_limit) return; uint8_t* b = (uint8_t*)base; - uint8_t* u = (uint8_t*)user; fprintf(stderr, "[TGUARD] alloc cls=%d base=%p user=%p stride=%zu hdr=%02x\n", cls, base, user, stride, b[0]); // 隣接ヘッダ可視化(前後) @@ -100,4 +99,3 @@ void tiny_guard_on_invalid(void* user_ptr, uint8_t hdr) { tiny_guard_dump_bytes("dump_before", u - 8, 8); tiny_guard_dump_bytes("dump_after", u, 8); } - diff --git a/core/hakmem_tiny_background.inc b/core/hakmem_tiny_background.inc index 7cf66c9c..e99bdb10 100644 --- a/core/hakmem_tiny_background.inc +++ b/core/hakmem_tiny_background.inc @@ -1,11 +1,7 @@ // Background Refill Bin (per-class lock-free SLL) — fills in background so the // front path only does a single CAS pop when both slots/bump are empty. static int g_bg_bin_enable = 0; // ENV toggle removed (fixed OFF) -static int g_bg_bin_target = 128; // Fixed target (legacy default) static _Atomic uintptr_t g_bg_bin_head[TINY_NUM_CLASSES]; -static pthread_t g_bg_bin_thread; -static volatile int g_bg_bin_stop = 0; -static int g_bg_bin_started = 0; // Inline helpers #include "hakmem_tiny_bg_bin.inc.h" @@ -25,65 +21,11 @@ static int g_bg_bin_started = 0; // Variables: g_bg_spill_enable, g_bg_spill_target, g_bg_spill_max_batch, g_bg_spill_head[], g_bg_spill_len[] -static void* tiny_bg_refill_main(void* arg) { - (void)arg; - const int sleep_us = 1000; // 1ms - while (!g_bg_bin_stop) { - if (!g_bg_bin_enable) { usleep(sleep_us); continue; } - for (int k = 0; k < TINY_NUM_CLASSES; k++) { - // まずは小クラスだけ対象(シンプルに) - if (!is_hot_class(k)) continue; - int have = bgbin_length_approx(k, g_bg_bin_target); - if (have >= g_bg_bin_target) continue; - int need = g_bg_bin_target - have; - - // 生成チェーンを作る(free listやbitmapから、裏で重い処理OK) - void* chain_head = NULL; void* chain_tail = NULL; int built = 0; - pthread_mutex_t* lock = &g_tiny_class_locks[k].m; - pthread_mutex_lock(lock); - TinySlab* slab = g_tiny_pool.free_slabs[k]; - // Adopt first slab with free blocks; if none, allocate one - if (!slab) slab = allocate_new_slab(k); - while (need > 0 && slab) { - if (slab->free_count == 0) { slab = slab->next; continue; } - int idx = hak_tiny_find_free_block(slab); - if (idx < 0) { slab = slab->next; continue; } - hak_tiny_set_used(slab, idx); - slab->free_count--; - size_t bs = g_tiny_class_sizes[k]; - void* p = (char*)slab->base + (idx * bs); - // prepend to local chain - tiny_next_write(k, p, chain_head); // Box API: next pointer write - chain_head = p; - if (!chain_tail) chain_tail = p; - built++; need--; - } - pthread_mutex_unlock(lock); - - if (built > 0) { - bgbin_push_chain(k, chain_head, chain_tail); - } - } - // Drain background spill queues (SuperSlab freelist return) - // EXTRACTED: Drain logic moved to hakmem_tiny_bg_spill.c (Phase 2C-2) - if (g_bg_spill_enable) { - for (int k = 0; k < TINY_NUM_CLASSES; k++) { - pthread_mutex_t* lock = &g_tiny_class_locks[k].m; - bg_spill_drain_class(k, lock); - } - } - // Drain remote frees - REMOVED (dead code cleanup 2025-11-27) - // The g_bg_remote_enable feature was never enabled in production - usleep(sleep_us); - } - return NULL; -} - static inline void eventq_push(int class_idx, uint32_t size) { eventq_push_ex(class_idx, size, HAK_TIER_FRONT, 0, 0, 0); } -static void* intelligence_engine_main(void* arg) { +static __attribute__((unused)) void* intelligence_engine_main(void* arg) { (void)arg; const int sleep_us = 100000; // 100ms int hist[TINY_NUM_CLASSES] = {0}; @@ -173,7 +115,7 @@ static void* intelligence_engine_main(void* arg) { } } - // Adapt per-class MAG/SLL caps (light-touch; protects hot classes) + // Adapt per-class MAG caps (light-touch; protects hot classes) if (adapt_caps) { for (int k = 0; k < TINY_NUM_CLASSES; k++) { int hot = (k <= 3); @@ -199,18 +141,6 @@ static void* intelligence_engine_main(void* arg) { if (cnt[k] > up_th) { mag += 16; if (mag > mag_max) mag = mag_max; } else if (cnt[k] < dn_th) { mag -= 16; if (mag < mag_min) mag = mag_min; } g_mag_cap_override[k] = mag; - - // SLL cap override (hot classes only); keep absolute cap modest - if (hot) { - int sll = g_sll_cap_override[k]; - if (sll <= 0) sll = 256; // starting point for hot classes - int sll_min = 128; - if (g_tiny_int_tight && g_tiny_cap_floor[k] > 0) sll_min = g_tiny_cap_floor[k]; - int sll_max = 1024; - if (cnt[k] > up_th) { sll += 32; if (sll > sll_max) sll = sll_max; } - else if (cnt[k] < dn_th) { sll -= 32; if (sll < sll_min) sll = sll_min; } - g_sll_cap_override[k] = sll; - } } } // Enforce Tiny RSS budget (if enabled): when over budget, shrink per-class caps by step @@ -221,7 +151,6 @@ static void* intelligence_engine_main(void* arg) { int floor = g_tiny_cap_floor[k]; if (floor <= 0) floor = 64; int mag = g_mag_cap_override[k]; if (mag <= 0) mag = tiny_effective_cap(k); mag -= g_tiny_diet_step; if (mag < floor) mag = floor; g_mag_cap_override[k] = mag; - // Phase12: SLL cap 調整は g_sll_cap_override ではなくポリシー側が担当するため、ここでは変更しない。 } } } diff --git a/core/hakmem_tiny_bg_bin.inc.h b/core/hakmem_tiny_bg_bin.inc.h index 01c9ca09..ec13109b 100644 --- a/core/hakmem_tiny_bg_bin.inc.h +++ b/core/hakmem_tiny_bg_bin.inc.h @@ -1,8 +1,7 @@ // Inline helpers for Background Refill Bin (lock-free SLL) // This header is textually included from hakmem_tiny.c after the following // symbols are defined: -// - g_bg_bin_enable, g_bg_bin_target, g_bg_bin_head[] -// - tiny_bg_refill_main() declaration/definition if needed +// - g_bg_bin_enable, g_bg_bin_head[] #include "box/tiny_next_ptr_box.h" // Phase E1-CORRECT: Box API for next pointer diff --git a/core/hakmem_tiny_bg_spill.c b/core/hakmem_tiny_bg_spill.c index 6848fe97..fb522072 100644 --- a/core/hakmem_tiny_bg_spill.c +++ b/core/hakmem_tiny_bg_spill.c @@ -45,6 +45,7 @@ void bg_spill_drain_class(int class_idx, pthread_mutex_t* lock) { #else const size_t next_off = 0; #endif +(void)next_off; #include "box/tiny_next_ptr_box.h" while (cur && processed < g_bg_spill_max_batch) { prev = cur; diff --git a/core/hakmem_tiny_fastcache.inc.h b/core/hakmem_tiny_fastcache.inc.h index 2feddb59..3b1e31a0 100644 --- a/core/hakmem_tiny_fastcache.inc.h +++ b/core/hakmem_tiny_fastcache.inc.h @@ -92,9 +92,7 @@ static inline __attribute__((always_inline)) hak_base_ptr_t tiny_fast_pop(int cl // Phase 7: header-aware next pointer (C0-C6: base+1, C7: base) #if HAKMEM_TINY_HEADER_CLASSIDX // Phase E1-CORRECT: ALL classes have 1-byte header, next ptr at offset 1 - const size_t next_offset = 1; #else - const size_t next_offset = 0; #endif // Phase E1-CORRECT: Use Box API for next pointer read (ALL classes: base+1) #include "box/tiny_next_ptr_box.h" @@ -172,9 +170,7 @@ static inline __attribute__((always_inline)) int tiny_fast_push(int class_idx, h // Phase 7: header-aware next pointer (C0-C6: base+1, C7: base) #if HAKMEM_TINY_HEADER_CLASSIDX // Phase E1-CORRECT: ALL classes have 1-byte header, next ptr at offset 1 - const size_t next_offset2 = 1; #else - const size_t next_offset2 = 0; #endif // Phase E1-CORRECT: Use Box API for next pointer write (ALL classes: base+1) #include "box/tiny_next_ptr_box.h" diff --git a/core/hakmem_tiny_free.inc b/core/hakmem_tiny_free.inc index e05bd1e8..c784b83e 100644 --- a/core/hakmem_tiny_free.inc +++ b/core/hakmem_tiny_free.inc @@ -29,7 +29,8 @@ static inline int tiny_drain_to_sll_budget(void) { if (__builtin_expect(v == -1, 0)) { const char* s = getenv("HAKMEM_TINY_DRAIN_TO_SLL"); int parsed = (s && *s) ? atoi(s) : 0; - if (parsed < 0) parsed = 0; if (parsed > 256) parsed = 256; + if (parsed < 0) parsed = 0; + if (parsed > 256) parsed = 256; v = parsed; } return v; @@ -673,15 +674,6 @@ void hak_tiny_shutdown(void) { tls->slab_base = NULL; } } - if (g_bg_bin_started) { - g_bg_bin_stop = 1; - if (!pthread_equal(tiny_self_pt(), g_bg_bin_thread)) { - pthread_join(g_bg_bin_thread, NULL); - } - g_bg_bin_started = 0; - g_bg_bin_enable = 0; - } - tiny_obs_shutdown(); if (g_int_engine && g_int_started) { g_int_stop = 1; // Best-effort join; avoid deadlock if called from within the thread diff --git a/core/hakmem_tiny_globals_box.inc b/core/hakmem_tiny_globals_box.inc index dc056d00..62cceee9 100644 --- a/core/hakmem_tiny_globals_box.inc +++ b/core/hakmem_tiny_globals_box.inc @@ -195,8 +195,6 @@ static __thread uint64_t g_tls_trim_seen[TINY_NUM_CLASSES]; static _Atomic(SuperSlab*) g_ss_partial_ring[TINY_NUM_CLASSES][SS_PARTIAL_RING]; static _Atomic(uint32_t) g_ss_partial_rr[TINY_NUM_CLASSES]; static _Atomic(SuperSlab*) g_ss_partial_over[TINY_NUM_CLASSES]; -static __thread int g_tls_adopt_cd[TINY_NUM_CLASSES]; -static int g_adopt_cool_period = -1; // env: HAKMEM_TINY_SS_ADOPT_COOLDOWN // Debug counters (per class): publish/adopt hits (visible when HAKMEM_DEBUG_COUNTERS) unsigned long long g_ss_publish_dbg[TINY_NUM_CLASSES] = {0}; diff --git a/core/hakmem_tiny_init.inc b/core/hakmem_tiny_init.inc index 5bd2daa7..b337f7d9 100644 --- a/core/hakmem_tiny_init.inc +++ b/core/hakmem_tiny_init.inc @@ -2,6 +2,7 @@ // Note: uses TLS ops inline helpers for prewarm when class5 hotpath is enabled #include "hakmem_tiny_tls_ops.h" #include "box/prewarm_box.h" // Box Prewarm API (Priority 3) +#include "box/tiny_route_box.h" // Phase 2D-2: Initialization function extraction // // This file contains the hak_tiny_init() function extracted from hakmem_tiny.c @@ -260,10 +261,6 @@ void hak_tiny_init(void) { snprintf(var, sizeof(var), "HAKMEM_TINY_MAG_CAP_C%d", i); char* vm = getenv(var); if (vm) { int v = atoi(vm); if (v > 0 && v <= TINY_TLS_MAG_CAP) g_mag_cap_override[i] = v; } - snprintf(var, sizeof(var), "HAKMEM_TINY_SLL_CAP_C%d", i); - char* vs = getenv(var); - // Phase12: g_sll_cap_override はレガシー互換ダミー。SLL cap は sll_cap_for_class()/TinyAcePolicy が担当するため、ここでは無視する。 - // Front refill count per-class override (fast path tuning) snprintf(var, sizeof(var), "HAKMEM_TINY_REFILL_COUNT_C%d", i); char* rc = getenv(var); @@ -395,23 +392,7 @@ void hak_tiny_init(void) { // - full: 全クラス TINY_ONLY tiny_route_init(); - tiny_obs_start_if_needed(); - - // Deferred Intelligence Engine - char* ie = getenv("HAKMEM_INT_ENGINE"); - if (ie && atoi(ie) != 0) { - g_int_engine = 1; - // Initialize frontend fill targets to zero (let engine grow if hot) - for (int i = 0; i < TINY_NUM_CLASSES; i++) atomic_store(&g_frontend_fill_target[i], 0); - // Event logging knobs (optional) - char* its = getenv("HAKMEM_INT_EVENT_TS"); - if (its && atoi(its) != 0) g_int_event_ts = 1; - char* ism = getenv("HAKMEM_INT_SAMPLE"); - if (ism) { int n = atoi(ism); if (n > 0 && n < 31) g_int_sample_mask = ((1u << n) - 1u); } - if (pthread_create(&g_int_thread, NULL, intelligence_engine_main, NULL) == 0) { - g_int_started = 1; - } - } + // OBS/INT エンジンは無効化(実験用)。必要なら復活させる。 // Step 2: Initialize Slab Registry (only if enabled) if (g_use_registry) { diff --git a/core/hakmem_tiny_intel.inc b/core/hakmem_tiny_intel.inc index 0d25fda4..2120b6c4 100644 --- a/core/hakmem_tiny_intel.inc +++ b/core/hakmem_tiny_intel.inc @@ -22,58 +22,17 @@ static pthread_t g_int_thread; static volatile int g_int_stop = 0; static int g_int_started = 0; -// Lightweight observation ring (async aggregation for TLS stats) -typedef struct { - uint8_t kind; - uint8_t class_idx; - uint16_t count; -} TinyObsEvent; -typedef struct { - uint64_t hit; - uint64_t miss; - uint64_t spill_ss; - uint64_t spill_owner; - uint64_t spill_mag; - uint64_t spill_requeue; -} TinyObsStats; +// OBS (観測) 機能は無効化。必要になった場合は git 履歴から復活させる。 +#define TINY_OBS_TLS_HIT 1 +#define TINY_OBS_TLS_MISS 2 +#define TINY_OBS_SPILL_SS 3 +#define TINY_OBS_SPILL_OWNER 4 +#define TINY_OBS_SPILL_MAG 5 +#define TINY_OBS_SPILL_REQUEUE 6 -enum { - TINY_OBS_TLS_HIT = 1, - TINY_OBS_TLS_MISS = 2, - TINY_OBS_SPILL_SS = 3, - TINY_OBS_SPILL_OWNER = 4, - TINY_OBS_SPILL_MAG = 5, - TINY_OBS_SPILL_REQUEUE = 6, -}; - -#define TINY_OBS_CAP 4096u -#define TINY_OBS_MASK (TINY_OBS_CAP - 1u) -static _Atomic uint32_t g_obs_tail = 0; -static _Atomic uint32_t g_obs_head = 0; -static TinyObsEvent g_obs_ring[TINY_OBS_CAP]; -static _Atomic uint8_t g_obs_ready[TINY_OBS_CAP]; -static int g_obs_enable = 0; // ENV toggle removed: observation disabled by default -static int g_obs_started = 0; -static pthread_t g_obs_thread; -static volatile int g_obs_stop = 0; -static TinyObsStats g_obs_stats[TINY_NUM_CLASSES]; -static uint64_t g_obs_epoch = 0; -static uint32_t g_obs_interval_default = 65536; -static uint32_t g_obs_interval_current = 65536; -static uint32_t g_obs_interval_min = 256; -static uint32_t g_obs_interval_max = 65536; -static uint32_t g_obs_interval_cooldown = 4; -static uint64_t g_obs_last_interval_epoch = 0; -static int g_obs_auto_tune = 0; // Default: Disable auto-tuning for predictable memory usage -static int g_obs_mag_step = 8; -static int g_obs_sll_step = 16; -static int g_obs_debug = 0; -static uint64_t g_obs_last_hit[TINY_NUM_CLASSES]; -static uint64_t g_obs_last_miss[TINY_NUM_CLASSES]; -static uint64_t g_obs_last_spill_ss[TINY_NUM_CLASSES]; -static uint64_t g_obs_last_spill_owner[TINY_NUM_CLASSES]; -static uint64_t g_obs_last_spill_mag[TINY_NUM_CLASSES]; -static uint64_t g_obs_last_spill_requeue[TINY_NUM_CLASSES]; +static inline void tiny_obs_update_interval(void) {} +static inline void tiny_obs_record(uint8_t kind, int class_idx) { (void)kind; (void)class_idx; } +static inline void tiny_obs_process(const void* ev_unused) { (void)ev_unused; } // --------------------------------------------------------------------------- // Tiny ACE (Adaptive Cache Engine) state machine @@ -139,7 +98,7 @@ static inline uint64_t tiny_ace_ema(uint64_t prev, uint64_t sample) { // EXTRACTED: static int get_rss_kb_self(void); -static void tiny_ace_update_mem_tight(uint64_t now_ns) { +static __attribute__((unused)) void tiny_ace_update_mem_tight(uint64_t now_ns) { if (g_tiny_rss_budget_kb <= 0) { g_ace_mem_tight_flag = 0; return; @@ -157,105 +116,23 @@ static void tiny_ace_update_mem_tight(uint64_t now_ns) { } } -static void tiny_ace_collect_stats(int idx, const TinyObsStats* st); -static void tiny_ace_refresh_hot_ranks(void); -static void tiny_ace_apply_policies(void); -static void tiny_ace_init_defaults(void); -static void tiny_obs_update_interval(void); - -static __thread uint32_t g_obs_hit_accum[TINY_NUM_CLASSES]; - -static inline void tiny_obs_enqueue(uint8_t kind, int class_idx, uint16_t count) { - uint32_t tail; - for (;;) { - tail = atomic_load_explicit(&g_obs_tail, memory_order_relaxed); - uint32_t head = atomic_load_explicit(&g_obs_head, memory_order_acquire); - if (tail - head >= TINY_OBS_CAP) return; // drop on overflow - uint32_t desired = tail + 1u; - if (atomic_compare_exchange_weak_explicit(&g_obs_tail, - &tail, - desired, - memory_order_acq_rel, - memory_order_relaxed)) { - break; - } - } - uint32_t idx = tail & TINY_OBS_MASK; - TinyObsEvent ev; - ev.kind = kind; - ev.class_idx = (uint8_t)class_idx; - ev.count = count; - g_obs_ring[idx] = ev; - atomic_store_explicit(&g_obs_ready[idx], 1u, memory_order_release); -} - -static inline void tiny_obs_record(uint8_t kind, int class_idx) { - if (__builtin_expect(!g_obs_enable, 0)) return; - if (__builtin_expect(kind == TINY_OBS_TLS_HIT, 1)) { - uint32_t interval = g_obs_interval_current; - if (interval <= 1u) { - tiny_obs_enqueue(kind, class_idx, 1u); - return; - } - uint32_t accum = ++g_obs_hit_accum[class_idx]; - if (accum < interval) return; - uint32_t emit = interval; - if (emit > UINT16_MAX) emit = UINT16_MAX; - if (accum > emit) { - g_obs_hit_accum[class_idx] = accum - emit; - } else { - g_obs_hit_accum[class_idx] = 0u; - } - tiny_obs_enqueue(kind, class_idx, (uint16_t)emit); - return; - } - tiny_obs_enqueue(kind, class_idx, 1u); -} - -static inline void tiny_obs_process(const TinyObsEvent* ev) { - int idx = ev->class_idx; - uint16_t count = ev->count; - if (idx < 0 || idx >= TINY_NUM_CLASSES || count == 0) return; - switch (ev->kind) { - case TINY_OBS_TLS_HIT: - g_tls_hit_count[idx] += count; - break; - case TINY_OBS_TLS_MISS: - g_tls_miss_count[idx] += count; - break; - case TINY_OBS_SPILL_SS: - g_tls_spill_ss_count[idx] += count; - break; - case TINY_OBS_SPILL_OWNER: - g_tls_spill_owner_count[idx] += count; - break; - case TINY_OBS_SPILL_MAG: - g_tls_spill_mag_count[idx] += count; - break; - case TINY_OBS_SPILL_REQUEUE: - g_tls_spill_requeue_count[idx] += count; - break; - default: - break; - } -} - -static void tiny_ace_collect_stats(int idx, const TinyObsStats* st) { +static __attribute__((unused)) void tiny_ace_collect_stats(int idx, const void* st_unused) { TinyAceState* cs = &g_ace_state[idx]; TinyAcePolicy pol = g_ace_policy[idx]; uint64_t now = g_ace_tick_now_ns; - uint64_t ops = st->hit + st->miss; - uint64_t spills_total = st->spill_ss + st->spill_owner + st->spill_mag; - uint64_t remote_spill = st->spill_owner; - uint64_t miss = st->miss; + (void)st_unused; + uint64_t ops = 0; + uint64_t spills_total = 0; + uint64_t remote_spill = 0; + uint64_t miss = 0; cs->ema_ops = tiny_ace_ema(cs->ema_ops, ops); cs->ema_spill = tiny_ace_ema(cs->ema_spill, spills_total); cs->ema_remote = tiny_ace_ema(cs->ema_remote, remote_spill); cs->ema_miss = tiny_ace_ema(cs->ema_miss, miss); - if (ops == 0 && spills_total == 0 && st->spill_requeue == 0) { + if (ops == 0 && spills_total == 0) { pol.ema_ops_snapshot = cs->ema_ops; g_ace_policy[idx] = pol; return; @@ -264,7 +141,7 @@ static void tiny_ace_collect_stats(int idx, const TinyObsStats* st) { TinyAceStateId next_state; if (g_ace_mem_tight_flag) { next_state = ACE_STATE_MEM_TIGHT; - } else if (st->spill_requeue > 0) { + } else if (spills_total > 0) { next_state = ACE_STATE_BURST; } else if (cs->ema_remote > 16 && cs->ema_remote >= (cs->ema_spill / 3 + 1)) { next_state = ACE_STATE_REMOTE_HEAVY; @@ -300,14 +177,13 @@ static void tiny_ace_collect_stats(int idx, const TinyObsStats* st) { if (current_mag < mag_min) current_mag = mag_min; if (current_mag > mag_max) current_mag = mag_max; - int mag_step = (g_obs_mag_step > 0) ? g_obs_mag_step : ACE_MAG_STEP_DEFAULT; + int mag_step = ACE_MAG_STEP_DEFAULT; if (mag_step < 1) mag_step = 1; - // Phase12: g_sll_cap_override はレガシー互換ダミー。SLL cap は TinyAcePolicy に直接保持する。 int current_sll = pol.sll_cap; if (current_sll < current_mag) current_sll = current_mag; if (current_sll < 32) current_sll = 32; - int sll_step = (g_obs_sll_step > 0) ? g_obs_sll_step : ACE_SLL_STEP_DEFAULT; + int sll_step = ACE_SLL_STEP_DEFAULT; if (sll_step < 1) sll_step = 1; int sll_max = TINY_TLS_MAG_CAP; @@ -457,28 +333,10 @@ static void tiny_ace_collect_stats(int idx, const TinyObsStats* st) { pol.hotmag_refill = (uint16_t)hot_refill_new; pol.ema_ops_snapshot = cs->ema_ops; - if (g_obs_debug) { - static const char* state_names[] = {"steady", "burst", "remote", "tight"}; - fprintf(stderr, - "[ace] class %d state=%s ops=%llu spill=%llu remote=%llu miss=%llu mag=%d->%d sll=%d fast=%u hot=%d/%d\n", - idx, - state_names[cs->state], - (unsigned long long)ops, - (unsigned long long)spills_total, - (unsigned long long)remote_spill, - (unsigned long long)miss, - current_mag, - new_mag, - new_sll, - (unsigned)new_fast, - hot_cap_new, - hot_refill_new); - } - g_ace_policy[idx] = pol; } -static void tiny_ace_refresh_hot_ranks(void) { +static __attribute__((unused)) void tiny_ace_refresh_hot_ranks(void) { int top1 = -1, top2 = -1, top3 = -1; uint64_t val1 = 0, val2 = 0, val3 = 0; for (int i = 0; i < TINY_NUM_CLASSES; i++) { @@ -554,7 +412,7 @@ static void tiny_ace_refresh_hot_ranks(void) { } } -static void tiny_ace_apply_policies(void) { +static __attribute__((unused)) void tiny_ace_apply_policies(void) { for (int i = 0; i < TINY_NUM_CLASSES; i++) { TinyAcePolicy* pol = &g_ace_policy[i]; @@ -570,7 +428,7 @@ static void tiny_ace_apply_policies(void) { tiny_tls_publish_targets(i, (uint32_t)new_mag); } if (pol->request_trim || new_mag < prev_mag) { - tiny_tls_request_trim(i, g_obs_epoch); + tiny_tls_request_trim(i, 0); } int new_sll = pol->sll_cap; @@ -602,8 +460,7 @@ static void tiny_ace_apply_policies(void) { } } } - -static void tiny_ace_init_defaults(void) { +static __attribute__((unused)) void tiny_ace_init_defaults(void) { uint64_t now = tiny_ace_now_ns(); int mult = (g_sll_multiplier > 0) ? g_sll_multiplier : 2; for (int i = 0; i < TINY_NUM_CLASSES; i++) { @@ -635,7 +492,6 @@ static void tiny_ace_init_defaults(void) { pol->hotmag_refill = hotmag_refill_target(i); if (g_mag_cap_override[i] <= 0) g_mag_cap_override[i] = pol->mag_cap; - // Phase12: g_sll_cap_override は使用しない(互換用ダミー) switch (i) { case 0: g_hot_alloc_fn[i] = tiny_hot_pop_class0; break; case 1: g_hot_alloc_fn[i] = tiny_hot_pop_class1; break; @@ -649,42 +505,6 @@ static void tiny_ace_init_defaults(void) { } } -static void tiny_obs_update_interval(void) { - if (!g_obs_auto_tune) return; - uint32_t current = g_obs_interval_current; - int active_states = 0; - for (int i = 0; i < TINY_NUM_CLASSES; i++) { - if (g_ace_policy[i].state != ACE_STATE_STEADY) { - active_states++; - } - } - int urgent = g_ace_mem_tight_flag || (active_states > 0); - if (urgent) { - uint32_t target = g_obs_interval_min; - if (target < 1u) target = 1u; - if (current != target) { - g_obs_interval_current = target; - g_obs_last_interval_epoch = g_obs_epoch; - if (g_obs_debug) { - fprintf(stderr, "[obs] interval -> %u (urgent)\n", target); - } - } - return; - } - if (current >= g_obs_interval_max) return; - if ((g_obs_epoch - g_obs_last_interval_epoch) < g_obs_interval_cooldown) return; - uint32_t target = current << 1; - if (target < current) target = g_obs_interval_max; // overflow guard - if (target > g_obs_interval_max) target = g_obs_interval_max; - if (target != current) { - g_obs_interval_current = target; - g_obs_last_interval_epoch = g_obs_epoch; - if (g_obs_debug) { - fprintf(stderr, "[obs] interval -> %u (steady)\n", target); - } - } -} - static inline void superslab_partial_release(SuperSlab* ss, uint32_t epoch) { #if defined(MADV_DONTNEED) if (!g_ss_partial_enable) return; @@ -700,116 +520,6 @@ static inline void superslab_partial_release(SuperSlab* ss, uint32_t epoch) { #endif } -static inline void tiny_obs_adjust_class(int idx, const TinyObsStats* st) { - if (!g_obs_auto_tune) return; - tiny_ace_collect_stats(idx, st); -} - -static void tiny_obs_apply_tuning(void) { - g_obs_epoch++; - g_ace_tick_now_ns = tiny_ace_now_ns(); - tiny_ace_update_mem_tight(g_ace_tick_now_ns); - for (int i = 0; i < TINY_NUM_CLASSES; i++) { - uint64_t cur_hit = g_tls_hit_count[i]; - uint64_t cur_miss = g_tls_miss_count[i]; - uint64_t cur_spill_ss = g_tls_spill_ss_count[i]; - uint64_t cur_spill_owner = g_tls_spill_owner_count[i]; - uint64_t cur_spill_mag = g_tls_spill_mag_count[i]; - uint64_t cur_spill_requeue = g_tls_spill_requeue_count[i]; - - TinyObsStats* stats = &g_obs_stats[i]; - stats->hit = cur_hit - g_obs_last_hit[i]; - stats->miss = cur_miss - g_obs_last_miss[i]; - stats->spill_ss = cur_spill_ss - g_obs_last_spill_ss[i]; - stats->spill_owner = cur_spill_owner - g_obs_last_spill_owner[i]; - stats->spill_mag = cur_spill_mag - g_obs_last_spill_mag[i]; - stats->spill_requeue = cur_spill_requeue - g_obs_last_spill_requeue[i]; - - g_obs_last_hit[i] = cur_hit; - g_obs_last_miss[i] = cur_miss; - g_obs_last_spill_ss[i] = cur_spill_ss; - g_obs_last_spill_owner[i] = cur_spill_owner; - g_obs_last_spill_mag[i] = cur_spill_mag; - g_obs_last_spill_requeue[i] = cur_spill_requeue; - - tiny_obs_adjust_class(i, stats); - } - if (g_obs_auto_tune) { - tiny_ace_refresh_hot_ranks(); - tiny_ace_apply_policies(); - tiny_obs_update_interval(); - } -} - -static void* tiny_obs_worker(void* arg) { - (void)arg; - uint32_t processed = 0; - while (!g_obs_stop) { - uint32_t head = atomic_load_explicit(&g_obs_head, memory_order_relaxed); - uint32_t tail = atomic_load_explicit(&g_obs_tail, memory_order_acquire); - if (head == tail) { - if (processed > 0) { - tiny_obs_apply_tuning(); - processed = 0; - } - struct timespec ts = {0, 1000000}; // 1.0 ms backoff when idle - nanosleep(&ts, NULL); - continue; - } - uint32_t idx = head & TINY_OBS_MASK; - if (!atomic_load_explicit(&g_obs_ready[idx], memory_order_acquire)) { - sched_yield(); - continue; - } - TinyObsEvent ev = g_obs_ring[idx]; - atomic_store_explicit(&g_obs_ready[idx], 0u, memory_order_release); - atomic_store_explicit(&g_obs_head, head + 1u, memory_order_relaxed); - tiny_obs_process(&ev); - if (++processed >= g_obs_interval_current) { - tiny_obs_apply_tuning(); - processed = 0; - } - } - // Drain remaining events before exit - for (;;) { - uint32_t head = atomic_load_explicit(&g_obs_head, memory_order_relaxed); - uint32_t tail = atomic_load_explicit(&g_obs_tail, memory_order_acquire); - if (head == tail) break; - uint32_t idx = head & TINY_OBS_MASK; - if (!atomic_load_explicit(&g_obs_ready[idx], memory_order_acquire)) { - sched_yield(); - continue; - } - TinyObsEvent ev = g_obs_ring[idx]; - atomic_store_explicit(&g_obs_ready[idx], 0u, memory_order_release); - atomic_store_explicit(&g_obs_head, head + 1u, memory_order_relaxed); - tiny_obs_process(&ev); - } - tiny_obs_apply_tuning(); - return NULL; -} - -static void tiny_obs_start_if_needed(void) { - // OBS runtime knobs removed; keep disabled for predictable memory use. - g_obs_enable = 0; - g_obs_started = 0; - (void)g_obs_interval_default; - (void)g_obs_interval_current; - (void)g_obs_interval_min; - (void)g_obs_interval_max; - (void)g_obs_auto_tune; - (void)g_obs_mag_step; - (void)g_obs_sll_step; - (void)g_obs_debug; -} - -static void tiny_obs_shutdown(void) { - if (!g_obs_started) return; - g_obs_stop = 1; - pthread_join(g_obs_thread, NULL); - g_obs_started = 0; - g_obs_enable = 0; -} // Tiny diet (memory-tight) controls // Event logging options: default minimal (no timestamp, no thread id) static int g_int_event_ts = 0; // HAKMEM_INT_EVENT_TS=1 to include timestamp diff --git a/core/hakmem_tiny_magazine.c b/core/hakmem_tiny_magazine.c index 806ac5f0..19e5c0f1 100644 --- a/core/hakmem_tiny_magazine.c +++ b/core/hakmem_tiny_magazine.c @@ -121,6 +121,7 @@ void hak_tiny_magazine_flush(int class_idx) { // Lock and flush entire Magazine to freelist pthread_mutex_t* lock = &g_tiny_class_locks[class_idx].m; struct timespec tss; int ss_time = hkm_prof_begin(&tss); + (void)ss_time; (void)tss; pthread_mutex_lock(lock); // Flush ALL blocks (not just half like normal spill) diff --git a/core/hakmem_tiny_refill.inc.h b/core/hakmem_tiny_refill.inc.h index e781aa43..3eb8af3b 100644 --- a/core/hakmem_tiny_refill.inc.h +++ b/core/hakmem_tiny_refill.inc.h @@ -198,6 +198,7 @@ static inline void* superslab_tls_bump_fast(int class_idx) { // 旧来の複雑な経路を削り、FC/SLLのみの最小ロジックにする。 static inline void* tiny_fast_refill_and_take(int class_idx, TinyTLSList* tls) { + (void)tls; // 1) Front FastCache から直接 // Phase 7-Step6-Fix: Use config macro for dead code elimination in PGO mode if (__builtin_expect(TINY_FRONT_FASTCACHE_ENABLED && class_idx <= 3, 1)) { diff --git a/core/hakmem_tiny_sll_cap_box.inc b/core/hakmem_tiny_sll_cap_box.inc index 9a2a6ba4..58f7b167 100644 --- a/core/hakmem_tiny_sll_cap_box.inc +++ b/core/hakmem_tiny_sll_cap_box.inc @@ -1,5 +1,5 @@ static inline uint32_t sll_cap_for_class(int class_idx, uint32_t mag_cap) { - // Phase12: g_sll_cap_override は非推奨。ここでは無視して通常capを返す。 + // Phase12+: 旧 g_sll_cap_override は削除済み。ここでは通常capのみを使用する。 uint32_t cap = mag_cap; if (class_idx <= 3) { uint32_t mult = (g_sll_multiplier > 0 ? (uint32_t)g_sll_multiplier : 1u); diff --git a/core/hakmem_tiny_stats.c b/core/hakmem_tiny_stats.c index f91a9818..a33cc2ce 100644 --- a/core/hakmem_tiny_stats.c +++ b/core/hakmem_tiny_stats.c @@ -91,34 +91,7 @@ void hak_tiny_print_stats(void) { (unsigned long long)g_tls_spill_requeue_count[i]); } printf("---------------------------------------------\n\n"); - // Observation snapshot (disabled unless Tiny obs is explicitly enabled) -#ifdef HAKMEM_TINY_OBS_ENABLE - extern unsigned long long g_obs_epoch; - extern unsigned int g_obs_interval; - typedef struct { - unsigned long long hit, miss, spill_ss, spill_owner, spill_mag, spill_requeue; - } TinyObsStats; - extern TinyObsStats g_obs_stats[TINY_NUM_CLASSES]; - printf("Observation Snapshot (epoch %llu, interval %u events)\n", - (unsigned long long)g_obs_epoch, - g_obs_interval); - printf("Class | dHit | dMiss | dSpSS | dSpOwn | dSpMag | dSpReq\n"); - printf("------+-----------+-----------+-----------+-----------+-----------+-----------\n"); - for (int i = 0; i < TINY_NUM_CLASSES; i++) { - TinyObsStats* st = &g_obs_stats[i]; - printf(" %d | %9llu | %9llu | %9llu | %9llu | %9llu | %9llu\n", - i, - (unsigned long long)st->hit, - (unsigned long long)st->miss, - (unsigned long long)st->spill_ss, - (unsigned long long)st->spill_owner, - (unsigned long long)st->spill_mag, - (unsigned long long)st->spill_requeue); - } - printf("---------------------------------------------\n\n"); -#else - printf("Observation Snapshot: disabled (build-time)\n\n"); -#endif + printf("Observation Snapshot: removed (obs pipeline retired)\n\n"); #endif } diff --git a/core/hakmem_tiny_tls_ops.h b/core/hakmem_tiny_tls_ops.h index 577fa282..2114f451 100644 --- a/core/hakmem_tiny_tls_ops.h +++ b/core/hakmem_tiny_tls_ops.h @@ -67,6 +67,7 @@ static inline int tls_refill_from_tls_slab(int class_idx, TinyTLSList* tls, uint #else const size_t next_off_tls = 0; #endif + (void)next_off_tls; void* accum_head = NULL; void* accum_tail = NULL; uint32_t total = 0u; diff --git a/core/hakmem_tiny_tls_state_box.inc b/core/hakmem_tiny_tls_state_box.inc index f0593ae4..ef36a954 100644 --- a/core/hakmem_tiny_tls_state_box.inc +++ b/core/hakmem_tiny_tls_state_box.inc @@ -24,7 +24,7 @@ __thread uint64_t g_tls_canary_after_sll = TLS_CANARY_MAGIC; __thread const char* g_tls_sll_last_writer[TINY_NUM_CLASSES] = {0}; __thread TinyHeapV2Mag g_tiny_heap_v2_mag[TINY_NUM_CLASSES] = {0}; __thread TinyHeapV2Stats g_tiny_heap_v2_stats[TINY_NUM_CLASSES] = {0}; -static __thread int g_tls_heap_v2_initialized = 0; +__thread int g_tls_heap_v2_initialized = 0; // Phase 1: TLS SuperSlab Hint Box for Headerless mode // Size: 112 bytes per thread (4 slots * 24 bytes + 16 bytes overhead) @@ -109,11 +109,7 @@ unsigned long long g_front_fc_miss[TINY_NUM_CLASSES] = {0}; // TLS SLL class mask: bit i = 1 allows SLL for class i. Default: all 8 classes enabled. int g_tls_sll_class_mask = 0xFF; // Phase 6-1.7: Export for box refactor (Box 6 needs access from hakmem.c) -#ifdef HAKMEM_TINY_PHASE6_BOX_REFACTOR -inline __attribute__((always_inline)) pthread_t tiny_self_pt(void) { -#else static inline __attribute__((always_inline)) pthread_t tiny_self_pt(void) { -#endif if (__builtin_expect(!g_tls_pt_inited, 0)) { g_tls_pt_self = pthread_self(); g_tls_pt_inited = 1; @@ -125,7 +121,6 @@ static inline __attribute__((always_inline)) pthread_t tiny_self_pt(void) { // tiny_mmap_gate.h already included at top #include "tiny_publish.h" -int g_sll_cap_override[TINY_NUM_CLASSES] = {0}; // LEGACY (Phase12以降は参照しない/互換用ダミー) // Optional prefetch on SLL pop (guarded by env: HAKMEM_TINY_PREFETCH=1) static int g_tiny_prefetch = 0; diff --git a/core/mid_tcache.h b/core/mid_tcache.h index 08325a4f..445a0680 100644 --- a/core/mid_tcache.h +++ b/core/mid_tcache.h @@ -24,7 +24,8 @@ static inline int midtc_cap_global(void) { if (__builtin_expect(cap == -1, 0)) { const char* s = getenv("HAKMEM_MID_TC_CAP"); int v = (s && *s) ? atoi(s) : 32; // conservative default - if (v < 0) v = 0; if (v > 1024) v = 1024; + if (v < 0) v = 0; + if (v > 1024) v = 1024; cap = v; } return cap; @@ -56,4 +57,3 @@ static inline void* midtc_pop(int class_idx) { if (g_midtc_count[class_idx] > 0) g_midtc_count[class_idx]--; return h; } - diff --git a/core/smallobject_hotbox_v3.c b/core/smallobject_hotbox_v3.c index 56488d6d..b311e4fd 100644 --- a/core/smallobject_hotbox_v3.c +++ b/core/smallobject_hotbox_v3.c @@ -59,6 +59,11 @@ void so_v3_record_free_fallback(uint8_t ci) { if (st) atomic_fetch_add_explicit(&st->free_fallback_v1, 1, memory_order_relaxed); } +void so_v3_record_page_of_fail(uint8_t ci) { + so_stats_class_v3* st = so_stats_for(ci); + if (st) atomic_fetch_add_explicit(&st->page_of_fail, 1, memory_order_relaxed); +} + so_ctx_v3* so_tls_get(void) { so_ctx_v3* ctx = g_so_ctx_v3; if (__builtin_expect(ctx == NULL, 0)) { @@ -208,6 +213,7 @@ static inline void so_free_fast(so_ctx_v3* ctx, uint32_t ci, void* ptr) { so_class_v3* hc = &ctx->cls[ci]; so_page_v3* page = so_page_of(hc, ptr); if (!page) { + so_v3_record_page_of_fail((uint8_t)ci); so_v3_record_free_fallback((uint8_t)ci); tiny_heap_free_class_fast(tiny_heap_ctx_for_thread(), (int)ci, ptr); return; @@ -243,6 +249,14 @@ static inline so_page_v3* so_alloc_refill_slow(so_ctx_v3* ctx, uint32_t ci) { if (!cold.refill_page) return NULL; so_page_v3* page = cold.refill_page(cold_ctx, ci); if (!page) return NULL; + if (!page->base || page->capacity == 0) { + if (cold.retire_page) { + cold.retire_page(cold_ctx, ci, page); + } else { + free(page); + } + return NULL; + } if (page->block_size == 0) { page->block_size = (uint32_t)tiny_stride_for_class((int)ci); @@ -306,6 +320,18 @@ void so_free(uint32_t class_idx, void* ptr) { so_free_fast(ctx, class_idx, ptr); } +int smallobject_hotbox_v3_can_own_c7(void* ptr) { + if (!ptr) return 0; + if (!small_heap_v3_c7_enabled()) return 0; + so_ctx_v3* ctx = g_so_ctx_v3; + if (!ctx) return 0; // TLS 未初期化なら ownership なし + so_class_v3* hc = &ctx->cls[7]; + so_page_v3* page = so_page_of(hc, ptr); + if (!page) return 0; + if (page->class_idx != 7) return 0; + return 1; +} + __attribute__((destructor)) static void so_v3_stats_dump(void) { if (!so_v3_stats_enabled()) return; @@ -317,9 +343,11 @@ static void so_v3_stats_dump(void) { uint64_t afb = atomic_load_explicit(&st->alloc_fallback_v1, memory_order_relaxed); uint64_t fc = atomic_load_explicit(&st->free_calls, memory_order_relaxed); uint64_t ffb = atomic_load_explicit(&st->free_fallback_v1, memory_order_relaxed); - if (rh + ac + afb + fc + ffb + ar == 0) continue; - fprintf(stderr, "[SMALL_HEAP_V3_STATS] cls=%d route_hits=%llu alloc_calls=%llu alloc_refill=%llu alloc_fb_v1=%llu free_calls=%llu free_fb_v1=%llu\n", + uint64_t pof = atomic_load_explicit(&st->page_of_fail, memory_order_relaxed); + if (rh + ac + afb + fc + ffb + ar + pof == 0) continue; + fprintf(stderr, "[SMALL_HEAP_V3_STATS] cls=%d route_hits=%llu alloc_calls=%llu alloc_refill=%llu alloc_fb_v1=%llu free_calls=%llu free_fb_v1=%llu page_of_fail=%llu\n", i, (unsigned long long)rh, (unsigned long long)ac, - (unsigned long long)ar, (unsigned long long)afb, (unsigned long long)fc, (unsigned long long)ffb); + (unsigned long long)ar, (unsigned long long)afb, (unsigned long long)fc, + (unsigned long long)ffb, (unsigned long long)pof); } } diff --git a/core/superslab/superslab_inline.h b/core/superslab/superslab_inline.h index 4b9ed9d5..875c0b0d 100644 --- a/core/superslab/superslab_inline.h +++ b/core/superslab/superslab_inline.h @@ -3,6 +3,10 @@ #include "superslab_types.h" #include "../tiny_box_geometry.h" // Box 3 geometry helpers (stride/base/capacity) +#include "../hakmem_super_registry.h" // Provides hak_super_lookup implementations + +// Forward declaration to avoid implicit declaration when building without LTO. +static inline SuperSlab* hak_super_lookup(void* ptr); // Forward declaration for unsafe remote drain used by refill/handle paths // Implemented in hakmem_tiny_superslab.c @@ -30,11 +34,6 @@ extern _Atomic uint64_t g_ss_active_dec_calls; // - ss_lookup_guarded() : 100-200 cycles, adds integrity checks // - ss_fast_lookup() : Backward compatible (→ ss_lookup_safe) // -// Note: hak_super_lookup() is implemented in hakmem_super_registry.h as static inline. -// We provide a forward declaration here so that ss_lookup_guarded() can call it -// even in translation units where hakmem_super_registry.h is included later. -static inline SuperSlab* hak_super_lookup(void* ptr); - // ============================================================================ // Contract Level 1: UNSAFE - Fast but dangerous (internal use only) // ============================================================================ diff --git a/core/superslab_cache.c b/core/superslab_cache.c index 4a574400..abd1ef15 100644 --- a/core/superslab_cache.c +++ b/core/superslab_cache.c @@ -51,6 +51,10 @@ void ss_cache_ensure_init(void) { void* ss_os_acquire(uint8_t size_class, size_t ss_size, uintptr_t ss_mask, int populate) { void* ptr = NULL; static int log_count = 0; + (void)populate; +#if HAKMEM_BUILD_RELEASE + (void)log_count; +#endif #ifdef MAP_ALIGNED_SUPER // MAP_POPULATE: Pre-fault pages to eliminate runtime page faults (60% of CPU overhead) @@ -91,6 +95,9 @@ void* ss_os_acquire(uint8_t size_class, size_t ss_size, uintptr_t ss_mask, int p log_count++; } #endif +#if HAKMEM_BUILD_RELEASE + (void)count; +#endif } if (raw == MAP_FAILED) { log_superslab_oom_once(ss_size, alloc_size, errno); diff --git a/core/superslab_stats.c b/core/superslab_stats.c index 061c6411..416d1ef6 100644 --- a/core/superslab_stats.c +++ b/core/superslab_stats.c @@ -106,6 +106,7 @@ void ss_stats_on_ss_scan(int class_idx, int slab_live, int is_empty) { // ============================================================================ void log_superslab_oom_once(size_t ss_size, size_t alloc_size, int err) { + (void)ss_size; (void)alloc_size; (void)err; static int logged = 0; if (logged) return; logged = 1; diff --git a/core/tiny_alloc_fast.inc.h b/core/tiny_alloc_fast.inc.h index a2b914e1..7b2553c9 100644 --- a/core/tiny_alloc_fast.inc.h +++ b/core/tiny_alloc_fast.inc.h @@ -177,127 +177,6 @@ static void tiny_fast_print_profile(void) { } // ========== Front-V2 helpers (tcache-like TLS magazine) ========== -// Priority-2: Use cached ENV (eliminate lazy-init overhead) -static inline int tiny_heap_v2_stats_enabled(void) { - return HAK_ENV_TINY_HEAP_V2_STATS(); -} - -// TLS HeapV2 initialization barrier (ensures mag->top is zero on first use) -static inline void tiny_heap_v2_ensure_init(void) { - extern __thread int g_tls_heap_v2_initialized; - extern __thread TinyHeapV2Mag g_tiny_heap_v2_mag[]; - - if (__builtin_expect(!g_tls_heap_v2_initialized, 0)) { - for (int i = 0; i < TINY_NUM_CLASSES; i++) { - g_tiny_heap_v2_mag[i].top = 0; - } - g_tls_heap_v2_initialized = 1; - } -} - -static inline int tiny_heap_v2_refill_mag(int class_idx) { - // FIX: Ensure TLS is initialized before first magazine access - tiny_heap_v2_ensure_init(); - if (class_idx < 0 || class_idx > 3) return 0; - if (!tiny_heap_v2_class_enabled(class_idx)) return 0; - - // Phase 7-Step7: Use config macro for dead code elimination in PGO mode - if (!TINY_FRONT_TLS_SLL_ENABLED) return 0; - - TinyHeapV2Mag* mag = &g_tiny_heap_v2_mag[class_idx]; - const int cap = TINY_HEAP_V2_MAG_CAP; - int filled = 0; - - // FIX: Validate mag->top before use (prevent uninitialized TLS corruption) - if (mag->top < 0 || mag->top > cap) { - static __thread int s_reset_logged[TINY_NUM_CLASSES] = {0}; - if (!s_reset_logged[class_idx]) { - fprintf(stderr, "[HEAP_V2_REFILL] C%d mag->top=%d corrupted, reset to 0\n", - class_idx, mag->top); - s_reset_logged[class_idx] = 1; - } - mag->top = 0; - } - - // First, steal from TLS SLL if already available. - while (mag->top < cap) { - void* base = NULL; - if (!tls_sll_pop(class_idx, &base)) break; - mag->items[mag->top++] = base; - filled++; - } - - // If magazine is still empty, ask backend to refill SLL once, then steal again. - if (mag->top < cap && filled == 0) { -#if HAKMEM_TINY_P0_BATCH_REFILL - (void)sll_refill_batch_from_ss(class_idx, cap); -#else - (void)sll_refill_small_from_ss(class_idx, cap); -#endif - while (mag->top < cap) { - void* base = NULL; - if (!tls_sll_pop(class_idx, &base)) break; - mag->items[mag->top++] = base; - filled++; - } - } - - if (__builtin_expect(tiny_heap_v2_stats_enabled(), 0)) { - if (filled > 0) { - g_tiny_heap_v2_stats[class_idx].refill_calls++; - g_tiny_heap_v2_stats[class_idx].refill_blocks += (uint64_t)filled; - } - } - return filled; -} - -static inline void* tiny_heap_v2_alloc_by_class(int class_idx) { - // FIX: Ensure TLS is initialized before first magazine access - tiny_heap_v2_ensure_init(); - if (class_idx < 0 || class_idx > 3) return NULL; - // Phase 7-Step8: Use config macro for dead code elimination in PGO mode - if (!TINY_FRONT_HEAP_V2_ENABLED) return NULL; - if (!tiny_heap_v2_class_enabled(class_idx)) return NULL; - - TinyHeapV2Mag* mag = &g_tiny_heap_v2_mag[class_idx]; - - // Hit: magazine has entries - if (__builtin_expect(mag->top > 0, 1)) { - // FIX: Add underflow protection before array access - const int cap = TINY_HEAP_V2_MAG_CAP; - if (mag->top > cap || mag->top < 0) { - static __thread int s_reset_logged[TINY_NUM_CLASSES] = {0}; - if (!s_reset_logged[class_idx]) { - fprintf(stderr, "[HEAP_V2_ALLOC] C%d mag->top=%d corrupted, reset to 0\n", - class_idx, mag->top); - s_reset_logged[class_idx] = 1; - } - mag->top = 0; - return NULL; // Fall through to refill path - } - if (__builtin_expect(tiny_heap_v2_stats_enabled(), 0)) { - g_tiny_heap_v2_stats[class_idx].alloc_calls++; - g_tiny_heap_v2_stats[class_idx].mag_hits++; - } - return mag->items[--mag->top]; - } - - // Miss: try single refill from SLL/backend - int filled = tiny_heap_v2_refill_mag(class_idx); - if (filled > 0 && mag->top > 0) { - if (__builtin_expect(tiny_heap_v2_stats_enabled(), 0)) { - g_tiny_heap_v2_stats[class_idx].alloc_calls++; - g_tiny_heap_v2_stats[class_idx].mag_hits++; - } - return mag->items[--mag->top]; - } - - if (__builtin_expect(tiny_heap_v2_stats_enabled(), 0)) { - g_tiny_heap_v2_stats[class_idx].backend_oom++; - } - return NULL; -} - // ========== Fast Path: TLS Freelist Pop (3-4 instructions) ========== // External SFC control (defined in hakmem_tiny_sfc.c) diff --git a/core/tiny_destructors.c b/core/tiny_destructors.c new file mode 100644 index 00000000..0039ecc3 --- /dev/null +++ b/core/tiny_destructors.c @@ -0,0 +1,297 @@ +// tiny_destructors.c — Tiny の終了処理と統計ダンプを箱化 +#include "tiny_destructors.h" + +#include +#include + +#include "box/tiny_hotheap_v2_box.h" +#include "box/tiny_front_stats_box.h" +#include "box/tiny_heap_box.h" +#include "box/tiny_route_env_box.h" +#include "box/tls_sll_box.h" +#include "front/tiny_heap_v2.h" +#include "hakmem_env_cache.h" +#include "hakmem_tiny_magazine.h" +#include "hakmem_tiny_stats_api.h" + +static int g_flush_on_exit = 0; +static int g_ultra_debug_on_exit = 0; +static int g_path_debug_on_exit = 0; + +// HotHeap v2 stats storage (defined in hakmem_tiny.c) +extern _Atomic uint64_t g_tiny_hotheap_v2_route_hits[TINY_HOTHEAP_MAX_CLASSES]; +extern _Atomic uint64_t g_tiny_hotheap_v2_alloc_calls[TINY_HOTHEAP_MAX_CLASSES]; +extern _Atomic uint64_t g_tiny_hotheap_v2_alloc_fast[TINY_HOTHEAP_MAX_CLASSES]; +extern _Atomic uint64_t g_tiny_hotheap_v2_alloc_lease[TINY_HOTHEAP_MAX_CLASSES]; +extern _Atomic uint64_t g_tiny_hotheap_v2_alloc_fallback_v1[TINY_HOTHEAP_MAX_CLASSES]; +extern _Atomic uint64_t g_tiny_hotheap_v2_alloc_refill[TINY_HOTHEAP_MAX_CLASSES]; +extern _Atomic uint64_t g_tiny_hotheap_v2_refill_with_current[TINY_HOTHEAP_MAX_CLASSES]; +extern _Atomic uint64_t g_tiny_hotheap_v2_refill_with_partial[TINY_HOTHEAP_MAX_CLASSES]; +extern _Atomic uint64_t g_tiny_hotheap_v2_alloc_route_fb[TINY_HOTHEAP_MAX_CLASSES]; +extern _Atomic uint64_t g_tiny_hotheap_v2_free_calls[TINY_HOTHEAP_MAX_CLASSES]; +extern _Atomic uint64_t g_tiny_hotheap_v2_free_fast[TINY_HOTHEAP_MAX_CLASSES]; +extern _Atomic uint64_t g_tiny_hotheap_v2_free_fallback_v1[TINY_HOTHEAP_MAX_CLASSES]; +extern _Atomic uint64_t g_tiny_hotheap_v2_cold_refill_fail[TINY_HOTHEAP_MAX_CLASSES]; +extern _Atomic uint64_t g_tiny_hotheap_v2_cold_retire_calls[TINY_HOTHEAP_MAX_CLASSES]; +extern _Atomic uint64_t g_tiny_hotheap_v2_retire_calls_v2[TINY_HOTHEAP_MAX_CLASSES]; +extern _Atomic uint64_t g_tiny_hotheap_v2_partial_pushes[TINY_HOTHEAP_MAX_CLASSES]; +extern _Atomic uint64_t g_tiny_hotheap_v2_partial_pops[TINY_HOTHEAP_MAX_CLASSES]; +extern _Atomic uint64_t g_tiny_hotheap_v2_partial_peak[TINY_HOTHEAP_MAX_CLASSES]; +extern TinyHotHeapV2PageStats g_tiny_hotheap_v2_page_stats[TINY_HOTHEAP_MAX_CLASSES]; + +extern _Atomic uint64_t g_tiny_alloc_ge1024[TINY_NUM_CLASSES]; +extern _Atomic uint64_t g_tls_sll_invalid_head[TINY_NUM_CLASSES]; +extern _Atomic uint64_t g_tls_sll_invalid_push[TINY_NUM_CLASSES]; + +static void hak_flush_tiny_exit(void) { + if (g_flush_on_exit) { + hak_tiny_magazine_flush_all(); + hak_tiny_trim(); + } + if (g_ultra_debug_on_exit) { + hak_tiny_ultra_debug_dump(); + } + // Path debug dump (optional): HAKMEM_TINY_PATH_DEBUG=1 + hak_tiny_path_debug_dump(); + // Extended counters (optional): HAKMEM_TINY_COUNTERS_DUMP=1 + hak_tiny_debug_counters_dump(); + + // DEBUG: Print SuperSlab accounting stats + extern _Atomic uint64_t g_ss_active_dec_calls; + extern _Atomic uint64_t g_hak_tiny_free_calls; + extern _Atomic uint64_t g_ss_remote_push_calls; + extern _Atomic uint64_t g_free_ss_enter; + extern _Atomic uint64_t g_free_local_box_calls; + extern _Atomic uint64_t g_free_remote_box_calls; + extern uint64_t g_superslabs_allocated; + extern uint64_t g_superslabs_freed; + + fprintf(stderr, "\n[EXIT DEBUG] SuperSlab Accounting:\n"); + fprintf(stderr, " g_superslabs_allocated = %llu\n", (unsigned long long)g_superslabs_allocated); + fprintf(stderr, " g_superslabs_freed = %llu\n", (unsigned long long)g_superslabs_freed); + fprintf(stderr, " g_hak_tiny_free_calls = %llu\n", + (unsigned long long)atomic_load_explicit(&g_hak_tiny_free_calls, memory_order_relaxed)); + fprintf(stderr, " g_ss_remote_push_calls = %llu\n", + (unsigned long long)atomic_load_explicit(&g_ss_remote_push_calls, memory_order_relaxed)); + fprintf(stderr, " g_ss_active_dec_calls = %llu\n", + (unsigned long long)atomic_load_explicit(&g_ss_active_dec_calls, memory_order_relaxed)); + extern _Atomic uint64_t g_free_wrapper_calls; + fprintf(stderr, " g_free_wrapper_calls = %llu\n", + (unsigned long long)atomic_load_explicit(&g_free_wrapper_calls, memory_order_relaxed)); + fprintf(stderr, " g_free_ss_enter = %llu\n", + (unsigned long long)atomic_load_explicit(&g_free_ss_enter, memory_order_relaxed)); + fprintf(stderr, " g_free_local_box_calls = %llu\n", + (unsigned long long)atomic_load_explicit(&g_free_local_box_calls, memory_order_relaxed)); + fprintf(stderr, " g_free_remote_box_calls = %llu\n", + (unsigned long long)atomic_load_explicit(&g_free_remote_box_calls, memory_order_relaxed)); +} + +void tiny_destructors_configure_from_env(void) { + const char* tf = getenv("HAKMEM_TINY_FLUSH_ON_EXIT"); + if (tf && atoi(tf) != 0) { + g_flush_on_exit = 1; + } + const char* ud = getenv("HAKMEM_TINY_ULTRA_DEBUG"); + if (ud && atoi(ud) != 0) { + g_ultra_debug_on_exit = 1; + } + const char* pd = getenv("HAKMEM_TINY_PATH_DEBUG"); + if (pd) { + g_path_debug_on_exit = 1; + } +} + +void tiny_destructors_register_exit(void) { + if (g_flush_on_exit || g_ultra_debug_on_exit || g_path_debug_on_exit) { + atexit(hak_flush_tiny_exit); + } +} + +static int tiny_heap_stats_dump_enabled(void) { + static int g = -1; + if (__builtin_expect(g == -1, 0)) { + const char* eh = getenv("HAKMEM_TINY_HEAP_STATS_DUMP"); + const char* e = getenv("HAKMEM_TINY_C7_HEAP_STATS_DUMP"); + g = ((eh && *eh && *eh != '0') || (e && *e && *e != '0')) ? 1 : 0; + } + return g; +} + +__attribute__((destructor)) +static void tiny_heap_stats_dump(void) { + if (!tiny_heap_stats_enabled() || !tiny_heap_stats_dump_enabled()) { + return; + } + for (int cls = 0; cls < TINY_NUM_CLASSES; cls++) { + TinyHeapClassStats snap = { + .alloc_fast_current = atomic_load_explicit(&g_tiny_heap_stats[cls].alloc_fast_current, memory_order_relaxed), + .alloc_slow_prepare = atomic_load_explicit(&g_tiny_heap_stats[cls].alloc_slow_prepare, memory_order_relaxed), + .free_fast_local = atomic_load_explicit(&g_tiny_heap_stats[cls].free_fast_local, memory_order_relaxed), + .free_slow_fallback = atomic_load_explicit(&g_tiny_heap_stats[cls].free_slow_fallback, memory_order_relaxed), + .alloc_prepare_fail = atomic_load_explicit(&g_tiny_heap_stats[cls].alloc_prepare_fail, memory_order_relaxed), + .alloc_fail = atomic_load_explicit(&g_tiny_heap_stats[cls].alloc_fail, memory_order_relaxed), + }; + if (snap.alloc_fast_current == 0 && snap.alloc_slow_prepare == 0 && + snap.free_fast_local == 0 && snap.free_slow_fallback == 0 && + snap.alloc_prepare_fail == 0 && snap.alloc_fail == 0) { + continue; + } + fprintf(stderr, + "[HEAP_STATS cls=%d] alloc_fast_current=%llu alloc_slow_prepare=%llu free_fast_local=%llu free_slow_fallback=%llu alloc_prepare_fail=%llu alloc_fail=%llu\n", + cls, + (unsigned long long)snap.alloc_fast_current, + (unsigned long long)snap.alloc_slow_prepare, + (unsigned long long)snap.free_fast_local, + (unsigned long long)snap.free_slow_fallback, + (unsigned long long)snap.alloc_prepare_fail, + (unsigned long long)snap.alloc_fail); + } + TinyC7PageStats ps = { + .prepare_calls = atomic_load_explicit(&g_c7_page_stats.prepare_calls, memory_order_relaxed), + .prepare_with_current_null = atomic_load_explicit(&g_c7_page_stats.prepare_with_current_null, memory_order_relaxed), + .prepare_from_partial = atomic_load_explicit(&g_c7_page_stats.prepare_from_partial, memory_order_relaxed), + .current_set_from_free = atomic_load_explicit(&g_c7_page_stats.current_set_from_free, memory_order_relaxed), + .current_dropped_to_partial = atomic_load_explicit(&g_c7_page_stats.current_dropped_to_partial, memory_order_relaxed), + }; + if (ps.prepare_calls || ps.prepare_with_current_null || ps.prepare_from_partial || + ps.current_set_from_free || ps.current_dropped_to_partial) { + fprintf(stderr, + "[C7_PAGE_STATS] prepare_calls=%llu prepare_with_current_null=%llu prepare_from_partial=%llu current_set_from_free=%llu current_dropped_to_partial=%llu\n", + (unsigned long long)ps.prepare_calls, + (unsigned long long)ps.prepare_with_current_null, + (unsigned long long)ps.prepare_from_partial, + (unsigned long long)ps.current_set_from_free, + (unsigned long long)ps.current_dropped_to_partial); + fflush(stderr); + } +} + +__attribute__((destructor)) +static void tiny_front_class_stats_dump(void) { + if (!tiny_front_class_stats_dump_enabled()) { + return; + } + for (int cls = 0; cls < TINY_NUM_CLASSES; cls++) { + uint64_t a = atomic_load_explicit(&g_tiny_front_alloc_class[cls], memory_order_relaxed); + uint64_t f = atomic_load_explicit(&g_tiny_front_free_class[cls], memory_order_relaxed); + if (a == 0 && f == 0) { + continue; + } + fprintf(stderr, "[FRONT_CLASS cls=%d] alloc=%llu free=%llu\n", + cls, (unsigned long long)a, (unsigned long long)f); + } +} + +__attribute__((destructor)) +static void tiny_c7_delta_debug_destructor(void) { + if (tiny_c7_meta_light_enabled() && tiny_c7_delta_debug_enabled()) { + tiny_c7_heap_debug_dump_deltas(); + } + if (tiny_heap_meta_light_enabled_for_class(6) && tiny_c6_delta_debug_enabled()) { + tiny_c6_heap_debug_dump_deltas(); + } +} + +__attribute__((destructor)) +static void tiny_hotheap_v2_stats_dump(void) { + if (!tiny_hotheap_v2_stats_enabled()) { + return; + } + for (uint8_t ci = 0; ci < TINY_HOTHEAP_MAX_CLASSES; ci++) { + uint64_t alloc_calls = atomic_load_explicit(&g_tiny_hotheap_v2_alloc_calls[ci], memory_order_relaxed); + uint64_t route_hits = atomic_load_explicit(&g_tiny_hotheap_v2_route_hits[ci], memory_order_relaxed); + uint64_t alloc_fast = atomic_load_explicit(&g_tiny_hotheap_v2_alloc_fast[ci], memory_order_relaxed); + uint64_t alloc_lease = atomic_load_explicit(&g_tiny_hotheap_v2_alloc_lease[ci], memory_order_relaxed); + uint64_t alloc_fb = atomic_load_explicit(&g_tiny_hotheap_v2_alloc_fallback_v1[ci], memory_order_relaxed); + uint64_t free_calls = atomic_load_explicit(&g_tiny_hotheap_v2_free_calls[ci], memory_order_relaxed); + uint64_t free_fast = atomic_load_explicit(&g_tiny_hotheap_v2_free_fast[ci], memory_order_relaxed); + uint64_t free_fb = atomic_load_explicit(&g_tiny_hotheap_v2_free_fallback_v1[ci], memory_order_relaxed); + uint64_t cold_refill_fail = atomic_load_explicit(&g_tiny_hotheap_v2_cold_refill_fail[ci], memory_order_relaxed); + uint64_t cold_retire_calls = atomic_load_explicit(&g_tiny_hotheap_v2_cold_retire_calls[ci], memory_order_relaxed); + uint64_t retire_calls_v2 = atomic_load_explicit(&g_tiny_hotheap_v2_retire_calls_v2[ci], memory_order_relaxed); + uint64_t partial_pushes = atomic_load_explicit(&g_tiny_hotheap_v2_partial_pushes[ci], memory_order_relaxed); + uint64_t partial_pops = atomic_load_explicit(&g_tiny_hotheap_v2_partial_pops[ci], memory_order_relaxed); + uint64_t partial_peak = atomic_load_explicit(&g_tiny_hotheap_v2_partial_peak[ci], memory_order_relaxed); + uint64_t refill_with_cur = atomic_load_explicit(&g_tiny_hotheap_v2_refill_with_current[ci], memory_order_relaxed); + uint64_t refill_with_partial = atomic_load_explicit(&g_tiny_hotheap_v2_refill_with_partial[ci], memory_order_relaxed); + + TinyHotHeapV2PageStats ps = { + .prepare_calls = atomic_load_explicit(&g_tiny_hotheap_v2_page_stats[ci].prepare_calls, memory_order_relaxed), + .prepare_with_current_null = atomic_load_explicit(&g_tiny_hotheap_v2_page_stats[ci].prepare_with_current_null, memory_order_relaxed), + .prepare_from_partial = atomic_load_explicit(&g_tiny_hotheap_v2_page_stats[ci].prepare_from_partial, memory_order_relaxed), + .free_made_current = atomic_load_explicit(&g_tiny_hotheap_v2_page_stats[ci].free_made_current, memory_order_relaxed), + .page_retired = atomic_load_explicit(&g_tiny_hotheap_v2_page_stats[ci].page_retired, memory_order_relaxed), + }; + + if (!(alloc_calls || alloc_fast || alloc_lease || alloc_fb || free_calls || free_fast || free_fb || + ps.prepare_calls || ps.prepare_with_current_null || ps.prepare_from_partial || + ps.free_made_current || ps.page_retired || retire_calls_v2 || partial_pushes || partial_pops || partial_peak)) { + continue; + } + + tiny_route_kind_t route_kind = tiny_route_for_class(ci); + fprintf(stderr, + "[HOTHEAP_V2_STATS cls=%u route=%d] route_hits=%llu alloc_calls=%llu alloc_fast=%llu alloc_lease=%llu alloc_refill=%llu refill_cur=%llu refill_partial=%llu alloc_fb_v1=%llu alloc_route_fb=%llu cold_refill_fail=%llu cold_retire_calls=%llu retire_v2=%llu free_calls=%llu free_fast=%llu free_fb_v1=%llu prep_calls=%llu prep_null=%llu prep_from_partial=%llu free_made_current=%llu page_retired=%llu partial_push=%llu partial_pop=%llu partial_peak=%llu\n", + (unsigned)ci, + (int)route_kind, + (unsigned long long)route_hits, + (unsigned long long)alloc_calls, + (unsigned long long)alloc_fast, + (unsigned long long)alloc_lease, + (unsigned long long)atomic_load_explicit(&g_tiny_hotheap_v2_alloc_refill[ci], memory_order_relaxed), + (unsigned long long)refill_with_cur, + (unsigned long long)refill_with_partial, + (unsigned long long)alloc_fb, + (unsigned long long)atomic_load_explicit(&g_tiny_hotheap_v2_alloc_route_fb[ci], memory_order_relaxed), + (unsigned long long)cold_refill_fail, + (unsigned long long)cold_retire_calls, + (unsigned long long)retire_calls_v2, + (unsigned long long)free_calls, + (unsigned long long)free_fast, + (unsigned long long)free_fb, + (unsigned long long)ps.prepare_calls, + (unsigned long long)ps.prepare_with_current_null, + (unsigned long long)ps.prepare_from_partial, + (unsigned long long)ps.free_made_current, + (unsigned long long)ps.page_retired, + (unsigned long long)partial_pushes, + (unsigned long long)partial_pops, + (unsigned long long)partial_peak); + } +} + +static void tiny_heap_v2_stats_atexit(void) __attribute__((destructor)); +static void tiny_heap_v2_stats_atexit(void) { + tiny_heap_v2_print_stats(); +} + +static void tiny_alloc_1024_diag_atexit(void) __attribute__((destructor)); +static void tiny_alloc_1024_diag_atexit(void) { + // Priority-2: Use cached ENV + if (!HAK_ENV_TINY_ALLOC_1024_METRIC()) return; + fprintf(stderr, "\n[ALLOC_GE1024] per-class counts (size>=1024)\n"); + for (int cls = 0; cls < TINY_NUM_CLASSES; cls++) { + uint64_t v = atomic_load_explicit(&g_tiny_alloc_ge1024[cls], memory_order_relaxed); + if (v) { + fprintf(stderr, " C%d=%llu", cls, (unsigned long long)v); + } + } + fprintf(stderr, "\n"); +} + +static void tiny_tls_sll_diag_atexit(void) __attribute__((destructor)); +static void tiny_tls_sll_diag_atexit(void) { +#if !HAKMEM_BUILD_RELEASE + // Priority-2: Use cached ENV + if (!HAK_ENV_TINY_SLL_DIAG()) return; + fprintf(stderr, "\n[TLS_SLL_DIAG] invalid head/push counts per class\n"); + for (int cls = 0; cls < TINY_NUM_CLASSES; cls++) { + uint64_t ih = atomic_load_explicit(&g_tls_sll_invalid_head[cls], memory_order_relaxed); + uint64_t ip = atomic_load_explicit(&g_tls_sll_invalid_push[cls], memory_order_relaxed); + if (ih || ip) { + fprintf(stderr, " C%d: invalid_head=%llu invalid_push=%llu\n", + cls, (unsigned long long)ih, (unsigned long long)ip); + } + } +#endif +} diff --git a/core/tiny_destructors.h b/core/tiny_destructors.h new file mode 100644 index 00000000..edb9b8c9 --- /dev/null +++ b/core/tiny_destructors.h @@ -0,0 +1,31 @@ +// tiny_destructors.h — Tiny の終了処理・統計ダンプを箱化 +#ifndef TINY_DESTRUCTORS_H +#define TINY_DESTRUCTORS_H + +#include +#include +#include + +#include "hakmem_tiny.h" + +typedef struct { + _Atomic uint64_t prepare_calls; + _Atomic uint64_t prepare_with_current_null; + _Atomic uint64_t prepare_from_partial; + _Atomic uint64_t free_made_current; + _Atomic uint64_t page_retired; +} TinyHotHeapV2PageStats; + +static inline int tiny_hotheap_v2_stats_enabled(void) { + static int g = -1; + if (__builtin_expect(g == -1, 0)) { + const char* e = getenv("HAKMEM_TINY_HOTHEAP_V2_STATS"); + g = (e && *e && *e != '0') ? 1 : 0; + } + return g; +} + +void tiny_destructors_configure_from_env(void); +void tiny_destructors_register_exit(void); + +#endif // TINY_DESTRUCTORS_H diff --git a/core/tiny_failfast.d b/core/tiny_failfast.d index 956f5eef..e4098d0c 100644 --- a/core/tiny_failfast.d +++ b/core/tiny_failfast.d @@ -3,7 +3,12 @@ core/tiny_failfast.o: core/tiny_failfast.c core/hakmem_tiny_superslab.h \ core/superslab/superslab_inline.h core/superslab/superslab_types.h \ core/superslab/../tiny_box_geometry.h \ core/superslab/../hakmem_tiny_superslab_constants.h \ - core/superslab/../hakmem_tiny_config.h core/tiny_debug_ring.h \ + core/superslab/../hakmem_tiny_config.h \ + core/superslab/../hakmem_super_registry.h \ + core/superslab/../hakmem_tiny_superslab.h \ + core/superslab/../box/ss_addr_map_box.h \ + core/superslab/../box/../hakmem_build_flags.h \ + core/superslab/../box/super_reg_box.h core/tiny_debug_ring.h \ core/hakmem_build_flags.h core/tiny_remote.h \ core/hakmem_tiny_superslab_constants.h core/hakmem_debug_master.h core/hakmem_tiny_superslab.h: @@ -14,6 +19,11 @@ core/superslab/superslab_types.h: core/superslab/../tiny_box_geometry.h: core/superslab/../hakmem_tiny_superslab_constants.h: core/superslab/../hakmem_tiny_config.h: +core/superslab/../hakmem_super_registry.h: +core/superslab/../hakmem_tiny_superslab.h: +core/superslab/../box/ss_addr_map_box.h: +core/superslab/../box/../hakmem_build_flags.h: +core/superslab/../box/super_reg_box.h: core/tiny_debug_ring.h: core/hakmem_build_flags.h: core/tiny_remote.h: diff --git a/core/tiny_free_magazine.inc.h b/core/tiny_free_magazine.inc.h index f5b9308c..a242888a 100644 --- a/core/tiny_free_magazine.inc.h +++ b/core/tiny_free_magazine.inc.h @@ -307,9 +307,6 @@ } // Spill half under class lock pthread_mutex_t* lock = &g_tiny_class_locks[class_idx].m; - // Profiling fix - struct timespec tss; - int ss_time = hkm_prof_begin(&tss); pthread_mutex_lock(lock); int spill = cap / 2; diff --git a/core/tiny_mmap_gate.h b/core/tiny_mmap_gate.h index f968a427..89bd2a1d 100644 --- a/core/tiny_mmap_gate.h +++ b/core/tiny_mmap_gate.h @@ -27,7 +27,8 @@ static inline SuperSlab* tiny_must_adopt_gate(int class_idx, TinyTLSSlab* tls) { if (__builtin_expect(s_cd_def == -1, 0)) { const char* cd = getenv("HAKMEM_TINY_SS_ADOPT_COOLDOWN"); int v = cd ? atoi(cd) : 32; // default: 32 missesの間は休む - if (v < 0) v = 0; if (v > 1024) v = 1024; + if (v < 0) v = 0; + if (v > 1024) v = 1024; s_cd_def = v; } if (s_cooldown[class_idx] > 0) { diff --git a/core/tiny_nextptr.h b/core/tiny_nextptr.h index 4894af31..6746c3ad 100644 --- a/core/tiny_nextptr.h +++ b/core/tiny_nextptr.h @@ -48,12 +48,12 @@ #include "box/tiny_header_box.h" // Per-thread trace context injected by PTR_NEXT_WRITE macro (for triage) -static __thread const char* g_tiny_next_tag = NULL; -static __thread const char* g_tiny_next_file = NULL; -static __thread int g_tiny_next_line = 0; -static __thread void* g_tiny_next_ra0 = NULL; -static __thread void* g_tiny_next_ra1 = NULL; -static __thread void* g_tiny_next_ra2 = NULL; +static __thread const char* g_tiny_next_tag __attribute__((unused)) = NULL; +static __thread const char* g_tiny_next_file __attribute__((unused)) = NULL; +static __thread int g_tiny_next_line __attribute__((unused)) = 0; +static __thread void* g_tiny_next_ra0 __attribute__((unused)) = NULL; +static __thread void* g_tiny_next_ra1 __attribute__((unused)) = NULL; +static __thread void* g_tiny_next_ra2 __attribute__((unused)) = NULL; // Compute freelist next-pointer offset within a block for the given class. // P0.1 updated: C0 and C7 use offset 0, C1-C6 use offset 1 (header preserved) diff --git a/core/tiny_publish.c b/core/tiny_publish.c index 80e03b7d..d848e435 100644 --- a/core/tiny_publish.c +++ b/core/tiny_publish.c @@ -4,6 +4,7 @@ #include "tiny_publish.h" #include "hakmem_tiny_stats_api.h" #include "tiny_debug_ring.h" +#include "hakmem_trace_master.h" #include #include diff --git a/core/tiny_refill_opt.h b/core/tiny_refill_opt.h index 3c9c5e70..22f9c2dd 100644 --- a/core/tiny_refill_opt.h +++ b/core/tiny_refill_opt.h @@ -317,7 +317,11 @@ static inline uint32_t trc_linear_carve(uint8_t* base, size_t bs, // SOLUTION: Write headers to ALL carved blocks (including C7) so splice detection works correctly. #if HAKMEM_TINY_HEADER_CLASSIDX // Write headers to all batch blocks (ALL classes C0-C7) + #if HAKMEM_BUILD_RELEASE + static _Atomic uint64_t g_carve_count __attribute__((unused)) = 0; + #else static _Atomic uint64_t g_carve_count = 0; + #endif for (uint32_t i = 0; i < batch; i++) { uint8_t* block = cursor + (i * stride); PTR_TRACK_CARVE((void*)block, class_idx); diff --git a/core/tiny_route.h b/core/tiny_route.h index 2e1ac6c9..9f479e79 100644 --- a/core/tiny_route.h +++ b/core/tiny_route.h @@ -22,9 +22,9 @@ // 19: first_free_transition // 20: mailbox_publish -static __thread uint64_t g_route_fp; -static __thread uint32_t g_route_seq; -static __thread int g_route_active; +static __thread uint64_t g_route_fp __attribute__((unused)); +static __thread uint32_t g_route_seq __attribute__((unused)); +static __thread int g_route_active __attribute__((unused)); static int g_route_enable_env = -1; static int g_route_sample_lg = -1; @@ -40,7 +40,8 @@ static inline uint32_t route_sample_mask(void) { if (__builtin_expect(g_route_sample_lg == -1, 0)) { const char* e = getenv("HAKMEM_ROUTE_SAMPLE_LG"); int lg = (e && *e) ? atoi(e) : 10; // 1/1024 既定 - if (lg < 0) lg = 0; if (lg > 24) lg = 24; + if (lg < 0) lg = 0; + if (lg > 24) lg = 24; g_route_sample_lg = lg; } return (g_route_sample_lg >= 31) ? 0xFFFFFFFFu : ((1u << g_route_sample_lg) - 1u); diff --git a/core/tiny_superslab_free.inc.h b/core/tiny_superslab_free.inc.h index 72bc2f27..0849973d 100644 --- a/core/tiny_superslab_free.inc.h +++ b/core/tiny_superslab_free.inc.h @@ -171,7 +171,7 @@ static inline void hak_tiny_free_superslab(void* ptr, SuperSlab* ss) { do { uint8_t hdr_cls = tiny_region_id_read_header(ptr); uint8_t meta_cls = meta->class_idx; - if (__builtin_expect(hdr_cls >= 0 && hdr_cls != meta_cls, 0)) { + if (__builtin_expect(hdr_cls != meta_cls, 0)) { static _Atomic uint32_t g_hdr_meta_mismatch = 0; uint32_t n = atomic_fetch_add_explicit(&g_hdr_meta_mismatch, 1, memory_order_relaxed); if (n < 16) { @@ -216,10 +216,10 @@ static inline void hak_tiny_free_superslab(void* ptr, SuperSlab* ss) { } } } while (0); +#if !HAKMEM_BUILD_RELEASE // DEBUG LOGGING - Track freelist operations // Priority-2: Use cached ENV (eliminate lazy-init TLS overhead) static __thread int free_count = 0; -#if !HAKMEM_BUILD_RELEASE if (HAK_ENV_SS_FREE_DEBUG() && (free_count++ % 1000) == 0) { #else if (0) { diff --git a/docs/analysis/ENV_PROFILE_PRESETS.md b/docs/analysis/ENV_PROFILE_PRESETS.md new file mode 100644 index 00000000..f4c5d438 --- /dev/null +++ b/docs/analysis/ENV_PROFILE_PRESETS.md @@ -0,0 +1,120 @@ +# ENV Profile Presets (HAKMEM) + +よく使う構成を 3 つのプリセットにまとめました。まずここからコピペし、必要な ENV だけを追加してください。v2 系や LEGACY 専用オプションは明示 opt-in で扱います。 +ベンチバイナリでは `HAKMEM_PROFILE=<名前>` をセットすると、ここで定義した ENV を自動で注入します(既に設定済みの ENV は上書きしません)。 + +--- + +## Profile 1: MIXED_TINYV3_C7_SAFE(標準 Mixed 16–1024B) + +### 目的 +- Mixed 16–1024B の標準ベンチ用。 +- C7-only SmallObject v3 + Tiny front v3 + LUT + fast classify ON。 +- Tiny/Pool v2 はすべて OFF。 + +### ENV 最小セット(Release) +```sh +HAKMEM_BENCH_MIN_SIZE=16 +HAKMEM_BENCH_MAX_SIZE=1024 +HAKMEM_TINY_HEAP_PROFILE=C7_SAFE +HAKMEM_TINY_C7_HOT=1 +HAKMEM_TINY_HOTHEAP_V2=0 +HAKMEM_SMALL_HEAP_V3_ENABLED=1 +HAKMEM_SMALL_HEAP_V3_CLASSES=0x80 +HAKMEM_POOL_V2_ENABLED=0 +HAKMEM_TINY_FRONT_V3_ENABLED=1 +HAKMEM_TINY_FRONT_V3_LUT_ENABLED=1 +HAKMEM_TINY_PTR_FAST_CLASSIFY_ENABLED=1 +HAKMEM_FREE_POLICY=batch +HAKMEM_THP=auto +``` + +### 任意オプション +- stats を見たいとき: +```sh +HAKMEM_TINY_HEAP_STATS=1 +HAKMEM_TINY_HEAP_STATS_DUMP=1 +HAKMEM_SMALL_HEAP_V3_STATS=1 +``` +- v2 系は触らない(C7_SAFE では Pool v2 / Tiny v2 は常時 OFF)。 +- vm.max_map_count が厳しい環境で Fail-Fast を避けたいときの応急処置(性能はほぼ同等〜微減): +```sh +HAKMEM_FREE_POLICY=keep +HAKMEM_DISABLE_BATCH=1 +HAKMEM_SS_MADVISE_STRICT=0 +``` + +--- + +## Profile 2: C6_HEAVY_LEGACY_POOLV1(mid/smallmid C6-heavy ベンチ) + +### 目的 +- C6-heavy mid/smallmid のベンチ用。 +- C6 は v1 固定(C6 v3 OFF)、Pool v2 OFF。Pool v1 flatten は bench 用に opt-in。 + +### ENV(v1 基準線) +```sh +HAKMEM_BENCH_MIN_SIZE=257 +HAKMEM_BENCH_MAX_SIZE=768 +HAKMEM_TINY_HEAP_PROFILE=C7_SAFE +HAKMEM_TINY_C6_HOT=1 +HAKMEM_TINY_HOTHEAP_V2=0 +HAKMEM_SMALL_HEAP_V3_ENABLED=1 +HAKMEM_SMALL_HEAP_V3_CLASSES=0x80 # C7-only v3, C6 v3 は OFF +HAKMEM_POOL_V2_ENABLED=0 +HAKMEM_POOL_V1_FLATTEN_ENABLED=0 # flatten は初回 OFF +``` + +### Pool v1 flatten A/B 用(LEGACY 専用) +```sh +# LEGACY + flatten ON (研究/bench専用) +HAKMEM_TINY_HEAP_PROFILE=LEGACY +HAKMEM_POOL_V2_ENABLED=0 +HAKMEM_POOL_V1_FLATTEN_ENABLED=1 +HAKMEM_POOL_V1_FLATTEN_STATS=1 +``` +- flatten は LEGACY 専用。C7_SAFE / C7_ULTRA_BENCH ではコード側で強制 OFF になる前提。 + +--- + +## Profile 3: DEBUG_TINY_FRONT_PERF(perf 用 DEBUG プロファイル) + +### 目的 +- Tiny front v3(C7 v3 含む)の perf record 用。 +- -O0 / -g / LTO OFF でシンボル付き計測。 + +### ビルド例 +```sh +make clean +CFLAGS='-O0 -g' USE_LTO=0 OPT_LEVEL=0 NATIVE=0 \ + make bench_random_mixed_hakmem -j4 +``` + +### ENV +```sh +HAKMEM_BENCH_MIN_SIZE=16 +HAKMEM_BENCH_MAX_SIZE=1024 +HAKMEM_TINY_HEAP_PROFILE=C7_SAFE +HAKMEM_TINY_C7_HOT=1 +HAKMEM_TINY_HOTHEAP_V2=0 +HAKMEM_SMALL_HEAP_V3_ENABLED=1 +HAKMEM_SMALL_HEAP_V3_CLASSES=0x80 +HAKMEM_POOL_V2_ENABLED=0 +HAKMEM_TINY_FRONT_V3_ENABLED=1 +HAKMEM_TINY_FRONT_V3_LUT_ENABLED=1 +HAKMEM_TINY_PTR_FAST_CLASSIFY_ENABLED=1 +``` + +### perf 例 +```sh +perf record -F 5000 --call-graph dwarf -e cycles:u \ + -o perf.data.tiny_front_tf3 \ + ./bench_random_mixed_hakmem 1000000 400 1 +``` +- perf 計測時はログを極力 OFF、ENV は MIXED_TINYV3_C7_SAFE をベースにする。 + +--- + +### 共通注意 +- プリセットから外れて単発の ENV を積み足すと再現が難しくなるので、まずは上記いずれかからスタートし、変更点を必ずメモしてください。 +- v2 系(Pool v2 / Tiny v2)はベンチごとに opt-in。不要なら常に 0。 diff --git a/docs/analysis/SMALLOBJECT_HOTBOX_V3_DESIGN.md b/docs/analysis/SMALLOBJECT_HOTBOX_V3_DESIGN.md index 2b70c28c..f3e98d92 100644 --- a/docs/analysis/SMALLOBJECT_HOTBOX_V3_DESIGN.md +++ b/docs/analysis/SMALLOBJECT_HOTBOX_V3_DESIGN.md @@ -28,6 +28,23 @@ SmallObject HotBox v3 Design (Tiny + mid/smallmid 統合案) - Route: `tiny_route_env_box.h` に `TINY_ROUTE_SMALL_HEAP_V3` を追加。クラスビットが立っているときだけ route snapshot で v3 に振り分け。 - Front: malloc/free で v3 route を試し、失敗時は v2/v1/legacy に落とす直線パス。デフォルトは OFF なので挙動は従来通り。 +### Phase S1: C6 v3 研究箱(C7 を壊さずにベンチ限定で解禁) +- Gate: `HAKMEM_SMALL_HEAP_V3_ENABLED`/`CLASSES` の bit7=C7(デフォルト ON=0x80)、bit6=C6(research-only、デフォルト OFF)。C6 を叩くときは `HAKMEM_TINY_C6_HOT=1` を併用して tiny front を確実に通す。 +- Cold IF: `smallobject_cold_iface_v1.h` を C6 にも適用し、`tiny_heap_prepare_page`/`page_becomes_empty` を C7 と同じ形で使う。v3 stats に `page_of_fail` を追加し、free 側の page_of ミスを計測。 +- Bench (Release, Tiny/Pool v2 OFF, ws=400, iters=1M): + - C6-heavy A/B: `MIN_SIZE=257 MAX_SIZE=768`。`CLASSES=0x80`(C6 v1)→ **47.71M ops/s**、`CLASSES=0x40`(C6 v3, stats ON)→ **36.77M ops/s**(cls6 `route_hits=266,930 alloc_refill=5 fb_v1=0 page_of_fail=0`)。v3 は約 -23%。 + - Mixed 16–1024B: `CLASSES=0x80`(C7-only)→ **47.45M ops/s**、`CLASSES=0xC0`(C6+C7 v3, stats ON)→ **44.45M ops/s**(cls6 `route_hits=137,307 alloc_refill=1 fb_v1=0 page_of_fail=0` / cls7 `alloc_refill=2,446`)。約 -6%。 +- 運用方針: 標準プロファイルは `HAKMEM_SMALL_HEAP_V3_CLASSES=0x80`(C7-only v3)に確定。C6 v3 は bench/研究のみ明示 opt-in とし、C6-heavy/Mixed の本線には乗せない。性能が盛り返すまで研究箱据え置き。 +- C6-heavy を v1 固定で走らせる推奨プリセット(研究と混同しないための明示例): + ``` + HAKMEM_BENCH_MIN_SIZE=257 + HAKMEM_BENCH_MAX_SIZE=768 + HAKMEM_TINY_HEAP_PROFILE=C7_SAFE + HAKMEM_TINY_C6_HOT=1 + HAKMEM_SMALL_HEAP_V3_ENABLED=1 + HAKMEM_SMALL_HEAP_V3_CLASSES=0x80 # C7-only v3 + ``` + 設計ゴール (SmallObjectHotBox v3) --------------------------------- - 対象サイズ帯: diff --git a/docs/analysis/TINY_CPU_HOTPATH_USERLAND_ANALYSIS.md b/docs/analysis/TINY_CPU_HOTPATH_USERLAND_ANALYSIS.md index 8f93620d..ab0a92bd 100644 --- a/docs/analysis/TINY_CPU_HOTPATH_USERLAND_ANALYSIS.md +++ b/docs/analysis/TINY_CPU_HOTPATH_USERLAND_ANALYSIS.md @@ -64,6 +64,23 @@ - route/guard 判定(unified_cache_enabled / tiny_guard_is_enabled / classify_ptr)が合わせて ~6% 程度。 - 次は「size→class→route 前段+header」をフラット化するターゲットが有力。 +## TF3 事前計測(DEBUGシンボル, front v3+LUT ON, C7-only v3) + +環境: `HAKMEM_BENCH_MIN_SIZE=16 HAKMEM_BENCH_MAX_SIZE=1024 HAKMEM_TINY_HEAP_PROFILE=C7_SAFE HAKMEM_TINY_C7_HOT=1 HAKMEM_TINY_HOTHEAP_V2=0 HAKMEM_POOL_V2_ENABLED=0 HAKMEM_SMALL_HEAP_V3_ENABLED=1 HAKMEM_SMALL_HEAP_V3_CLASSES=0x80 HAKMEM_TINY_FRONT_V3_ENABLED=1 HAKMEM_TINY_FRONT_V3_LUT_ENABLED=1` +ビルド: `BUILD_FLAVOR=debug OPT_LEVEL=0 USE_LTO=0 EXTRA_CFLAGS=-g` +ベンチ: `perf record -F5000 --call-graph dwarf -e cycles:u -o perf.data.tiny_front_tf3 ./bench_random_mixed_hakmem 1000000 400 1` +Throughput: **12.39M ops/s**(DEBUG/-O0 相当) + +- `ss_map_lookup`: **7.3% self**(free 側での ptr→SuperSlab 判定が主、C7 v3 でも多い) +- `hak_super_lookup`: **4.0% self**(lookup fallback 分) +- `classify_ptr`: **0.64% self**(free の入口 size→class 判定) +- `mid_desc_lookup`: **0.43% self**(mid 経路の記述子検索) +- そのほか: free/malloc/main が約 30% 強、header write 系は今回のデバッグログに埋もれて確認できず。 + +所感: +- front v3 + LUT ON でも free 側の `ss_map_lookup` / `hak_super_lookup` が ~11% 程度残っており、ここを FAST classify で直叩きする余地が大きい。 +- `classify_ptr` は 1% 未満だが、`ss_map_lookup` とセットで落とせれば +5〜10% の目標に寄せられる見込み。 + ### Front v3 snapshot 導入メモ - `TinyFrontV3Snapshot` を追加し、`unified_cache_on / tiny_guard_on / header_mode` を 1 回だけキャッシュする経路を front v3 ON 時に通すようにした(デフォルト OFF)。 - Mixed 16–1024B (ws=400, iters=1M, C7 v3 ON, Tiny/Pool v2 OFF) で挙動変化なし(slow=1 維持)。ホットスポットは依然 front 前段 (`tiny_region_id_write_header`, `ss_map_lookup`, guard/route 判定) が中心。 @@ -82,3 +99,11 @@ - header_v3=0: 44.29M ops/s, C7_PAGE_STATS prepare_calls=2446 - header_v3=1 + SKIP_C7=1: 43.68M ops/s(約 -1.4%)、prepare_calls=2446、fallback/page_of_fail=0 - 所感: C7 v3 のヘッダ簡略だけでは perf 改善は見えず。free 側のヘッダ依存を落とす or header light/off を別箱で検討する必要あり。 + +## TF3: ptr fast classify 実装後の A/B(C7-only v3, front v3+LUT ON) +- Releaseビルド, ws=400, iters=1M, ENV は TF3 基準 (`C7_SAFE`, C7_HOT=1, v2/pool v2=0, v3 classes=0x80, front v3/LUT ON)。 +- Throughput (ops/s): + - PTR_FAST_CLASSIFY=0: **33.91M** + - PTR_FAST_CLASSIFY=1: **36.67M**(約 +8.1%) +- DEBUG perf(同ENV, gate=1, cycles@5k, dwarf): `ss_map_lookup` self が **7.3% → 0.9%**、`hak_super_lookup` はトップから消失。代わりに TLS 内のページ判定 (`smallobject_hotbox_v3_can_own_c7` / `so_page_of`) が合計 ~5.5% へ移動。`classify_ptr` は 2–3% まで微増(外れ時のフォールバック分)。 +- 所感: C7 v3 free の Superslab lookup 往復をほぼ除去でき、目標の +5〜10% に収まる結果。fast path 判定の TLS 走査が新たなホットスポットだが、現状コストは lookup より低く許容範囲。 diff --git a/docs/design/TINY_FRONT_V3_FLATTENING_GUIDE.md b/docs/design/TINY_FRONT_V3_FLATTENING_GUIDE.md index 5f0ea4f7..52551f2b 100644 --- a/docs/design/TINY_FRONT_V3_FLATTENING_GUIDE.md +++ b/docs/design/TINY_FRONT_V3_FLATTENING_GUIDE.md @@ -90,3 +90,27 @@ Mixed 16–1024B で C7 v3 を ON にしたときの前段ホットパスを薄 - header_v3=0: 44.29M ops/s, C7_PAGE_STATS prepare_calls=2446 - header_v3=1 + SKIP_C7=1: 43.68M ops/s(約 -1.4%), prepare_calls=2446, v3 fallback/page_of_fail=0 - 所感: 短尺の header スキップだけでは改善なし。free 側の header 依存を外す or header_light 再設計を別フェーズで検討。 + +## Phase TF3: ptr fast classify(設計メモ / 実装TODO) +- ENV ゲート(デフォルト 0、A/B でのみ ON) + - `HAKMEM_TINY_PTR_FAST_CLASSIFY_ENABLED` +- 目的: C7 v3 free の入口で「明らかに Tiny/C7 のページ」だけを fast path に送り、`classify_ptr → ss_map_lookup → mid_desc_lookup` の往復を避ける。外れたら必ず従来の classify_ptr 経路へフォールバックする。 +- デザイン(free 側, malloc_tiny_fast.h 想定): + 1. gate + C7 v3 が有効かを Snapshot で確認(C6/Pool/off のときは何もしない)。 + 2. ptr から TLS context / so_page_of / page metadata だけで「self-thread の C7 v3 ページ」かを判定。 + 3. 判定 OK → `ss_map_lookup` を通さず C7 v3 の free 直行。 + 4. 判定 NG → 現行の classify_ptr/ss_map_lookup にそのまま落とす(Box 境界は不変)。 +- 実装担当向け TODO: + - ENV gate 追加(デフォルト 0)。 + - free 入口に C7 v3 専用 fast classify を追加(必ずフォールバックあり)。 + - A/B: Mixed 16–1024B, C7 v3 ON, front v3/LUT ON, Tiny/Pool v2 OFF + - baseline: PTR_FAST_CLASSIFY=0 + - trial: PTR_FAST_CLASSIFY=1 + - 期待: segv/assert なし、`ss_map_lookup / classify_ptr` self% 減、ops/s が +数%〜+10% 方向。 + +### 実装後メモ(2025/TF3) +- 実装: `tiny_ptr_fast_classify_enabled` ゲート追加、free 入口で C7 v3 の TLS ページ判定(`smallobject_hotbox_v3_can_own_c7`)が当たれば `so_free` へ直行。外れは従来 route/classify へフォールバック。 +- Mixed 16–1024B (C7-only v3, front v3+LUT ON, v2/pool v2 OFF, ws=400, iters=1M, Release): + - OFF: 33.9M ops/s → ON: 36.7M ops/s(約 +8.1%)。 +- DEBUG perf (cycles@5k, dwarf, gate=1): `ss_map_lookup` self が 7.3% → 0.9%、`hak_super_lookup` はトップ外へ。TLS 走査 (`smallobject_hotbox_v3_can_own_c7`) が ~5.5% に現れるが lookup 往復より低コスト。 +- ロールアウト案: Mixed 基準でプラスが安定しているため、front v3/LUT ON 前提では fast classify もデフォルトON候補。ENV=0 で即オフに戻せる構造は維持。 diff --git a/hakmem.d b/hakmem.d index d161be97..f5d6e4f4 100644 --- a/hakmem.d +++ b/hakmem.d @@ -10,23 +10,29 @@ hakmem.o: core/hakmem.c core/hakmem.h core/hakmem_build_flags.h \ core/hakmem_tiny_superslab_constants.h core/superslab/superslab_inline.h \ core/superslab/superslab_types.h core/superslab/../tiny_box_geometry.h \ core/superslab/../hakmem_tiny_superslab_constants.h \ - core/superslab/../hakmem_tiny_config.h core/tiny_debug_ring.h \ + core/superslab/../hakmem_tiny_config.h \ + core/superslab/../hakmem_super_registry.h \ + core/superslab/../hakmem_tiny_superslab.h \ + core/superslab/../box/ss_addr_map_box.h \ + core/superslab/../box/../hakmem_build_flags.h \ + core/superslab/../box/super_reg_box.h core/tiny_debug_ring.h \ core/tiny_remote.h core/hakmem_tiny_superslab_constants.h \ core/tiny_fastcache.h core/hakmem_env_cache.h \ core/box/tiny_next_ptr_box.h core/hakmem_tiny_config.h \ core/tiny_nextptr.h core/tiny_region_id.h core/tiny_box_geometry.h \ - core/ptr_track.h core/hakmem_super_registry.h core/box/ss_addr_map_box.h \ - core/box/../hakmem_build_flags.h core/box/super_reg_box.h \ - core/tiny_debug_api.h core/box/tiny_layout_box.h \ + core/ptr_track.h core/tiny_debug_api.h core/box/tiny_layout_box.h \ core/box/../hakmem_tiny_config.h core/box/tiny_header_box.h \ - core/box/tiny_layout_box.h core/box/../tiny_region_id.h \ - core/hakmem_elo.h core/hakmem_ace_stats.h core/hakmem_batch.h \ - core/hakmem_evo.h core/hakmem_debug.h core/hakmem_prof.h \ - core/hakmem_syscall.h core/hakmem_ace_controller.h \ + core/box/../hakmem_build_flags.h core/box/tiny_layout_box.h \ + core/box/../tiny_region_id.h core/hakmem_elo.h core/hakmem_ace_stats.h \ + core/hakmem_batch.h core/hakmem_evo.h core/hakmem_debug.h \ + core/hakmem_prof.h core/hakmem_syscall.h core/hakmem_ace_controller.h \ core/hakmem_ace_metrics.h core/hakmem_ace_ucb1.h \ core/box/bench_fast_box.h core/ptr_trace.h core/hakmem_trace_master.h \ core/hakmem_stats_master.h core/box/hak_kpi_util.inc.h \ core/box/hak_core_init.inc.h core/hakmem_phase7_config.h \ + core/box/libm_reloc_guard_box.h core/box/init_bench_preset_box.h \ + core/box/init_diag_box.h core/box/init_env_box.h \ + core/box/../tiny_destructors.h core/box/../hakmem_tiny.h \ core/box/ss_hot_prewarm_box.h core/box/hak_alloc_api.inc.h \ core/box/../hakmem_tiny.h core/box/../hakmem_pool.h \ core/box/../hakmem_smallmid.h core/box/tiny_heap_env_box.h \ @@ -48,10 +54,9 @@ hakmem.o: core/hakmem.c core/hakmem.h core/hakmem_build_flags.h \ core/box/../hakmem_build_flags.h core/box/../box/ss_hot_cold_box.h \ core/box/../box/../superslab/superslab_types.h \ core/box/../box/ss_allocation_box.h core/box/../hakmem_debug_master.h \ - core/box/../hakmem_tiny.h core/box/../hakmem_tiny_config.h \ - core/box/../hakmem_shared_pool.h core/box/../superslab/superslab_types.h \ - core/box/../hakmem_internal.h core/box/../tiny_region_id.h \ - core/box/../hakmem_tiny_integrity.h \ + core/box/../hakmem_tiny_config.h core/box/../hakmem_shared_pool.h \ + core/box/../superslab/superslab_types.h core/box/../hakmem_internal.h \ + core/box/../tiny_region_id.h core/box/../hakmem_tiny_integrity.h \ core/box/../box/slab_freelist_atomic.h \ core/box/../tiny_free_fast_v2.inc.h core/box/../box/tls_sll_box.h \ core/box/../box/../hakmem_internal.h \ @@ -75,8 +80,8 @@ hakmem.o: core/hakmem.c core/hakmem.h core/hakmem_build_flags.h \ core/box/../superslab/superslab_inline.h \ core/box/../box/ss_slab_meta_box.h core/box/../box/free_remote_box.h \ core/hakmem_tiny_integrity.h core/box/../box/ptr_conversion_box.h \ - core/box/hak_exit_debug.inc.h core/box/hak_wrappers.inc.h \ - core/box/front_gate_classifier.h core/box/../front/malloc_tiny_fast.h \ + core/box/hak_wrappers.inc.h core/box/front_gate_classifier.h \ + core/box/../front/malloc_tiny_fast.h \ core/box/../front/../hakmem_build_flags.h \ core/box/../front/../hakmem_tiny_config.h \ core/box/../front/../superslab/superslab_inline.h \ @@ -132,6 +137,11 @@ core/superslab/superslab_types.h: core/superslab/../tiny_box_geometry.h: core/superslab/../hakmem_tiny_superslab_constants.h: core/superslab/../hakmem_tiny_config.h: +core/superslab/../hakmem_super_registry.h: +core/superslab/../hakmem_tiny_superslab.h: +core/superslab/../box/ss_addr_map_box.h: +core/superslab/../box/../hakmem_build_flags.h: +core/superslab/../box/super_reg_box.h: core/tiny_debug_ring.h: core/tiny_remote.h: core/hakmem_tiny_superslab_constants.h: @@ -143,14 +153,11 @@ core/tiny_nextptr.h: core/tiny_region_id.h: core/tiny_box_geometry.h: core/ptr_track.h: -core/hakmem_super_registry.h: -core/box/ss_addr_map_box.h: -core/box/../hakmem_build_flags.h: -core/box/super_reg_box.h: core/tiny_debug_api.h: core/box/tiny_layout_box.h: core/box/../hakmem_tiny_config.h: core/box/tiny_header_box.h: +core/box/../hakmem_build_flags.h: core/box/tiny_layout_box.h: core/box/../tiny_region_id.h: core/hakmem_elo.h: @@ -170,6 +177,12 @@ core/hakmem_stats_master.h: core/box/hak_kpi_util.inc.h: core/box/hak_core_init.inc.h: core/hakmem_phase7_config.h: +core/box/libm_reloc_guard_box.h: +core/box/init_bench_preset_box.h: +core/box/init_diag_box.h: +core/box/init_env_box.h: +core/box/../tiny_destructors.h: +core/box/../hakmem_tiny.h: core/box/ss_hot_prewarm_box.h: core/box/hak_alloc_api.inc.h: core/box/../hakmem_tiny.h: @@ -208,7 +221,6 @@ core/box/../box/ss_hot_cold_box.h: core/box/../box/../superslab/superslab_types.h: core/box/../box/ss_allocation_box.h: core/box/../hakmem_debug_master.h: -core/box/../hakmem_tiny.h: core/box/../hakmem_tiny_config.h: core/box/../hakmem_shared_pool.h: core/box/../superslab/superslab_types.h: @@ -249,7 +261,6 @@ core/box/../box/ss_slab_meta_box.h: core/box/../box/free_remote_box.h: core/hakmem_tiny_integrity.h: core/box/../box/ptr_conversion_box.h: -core/box/hak_exit_debug.inc.h: core/box/hak_wrappers.inc.h: core/box/front_gate_classifier.h: core/box/../front/malloc_tiny_fast.h: diff --git a/hakmem_batch.d b/hakmem_batch.d index 9b182e7c..e6119898 100644 --- a/hakmem_batch.d +++ b/hakmem_batch.d @@ -1,12 +1,14 @@ hakmem_batch.o: core/hakmem_batch.c core/hakmem_batch.h core/hakmem_sys.h \ core/hakmem_whale.h core/hakmem_env_cache.h core/box/ss_os_acquire_box.h \ - core/hakmem_internal.h core/hakmem.h core/hakmem_build_flags.h \ - core/hakmem_config.h core/hakmem_features.h core/box/ptr_type_box.h + core/box/madvise_guard_box.h core/hakmem_internal.h core/hakmem.h \ + core/hakmem_build_flags.h core/hakmem_config.h core/hakmem_features.h \ + core/box/ptr_type_box.h core/hakmem_batch.h: core/hakmem_sys.h: core/hakmem_whale.h: core/hakmem_env_cache.h: core/box/ss_os_acquire_box.h: +core/box/madvise_guard_box.h: core/hakmem_internal.h: core/hakmem.h: core/hakmem_build_flags.h: diff --git a/hakmem_l25_pool.d b/hakmem_l25_pool.d index 82bfb0ad..e6975f78 100644 --- a/hakmem_l25_pool.d +++ b/hakmem_l25_pool.d @@ -2,9 +2,9 @@ hakmem_l25_pool.o: core/hakmem_l25_pool.c core/hakmem_l25_pool.h \ core/hakmem_config.h core/hakmem_features.h core/hakmem_internal.h \ core/hakmem.h core/hakmem_build_flags.h core/hakmem_sys.h \ core/hakmem_whale.h core/box/ptr_type_box.h core/box/ss_os_acquire_box.h \ - core/hakmem_syscall.h core/box/pagefault_telemetry_box.h \ - core/page_arena.h core/hakmem_prof.h core/hakmem_debug.h \ - core/hakmem_policy.h + core/box/madvise_guard_box.h core/hakmem_syscall.h \ + core/box/pagefault_telemetry_box.h core/page_arena.h core/hakmem_prof.h \ + core/hakmem_debug.h core/hakmem_policy.h core/hakmem_l25_pool.h: core/hakmem_config.h: core/hakmem_features.h: @@ -15,6 +15,7 @@ core/hakmem_sys.h: core/hakmem_whale.h: core/box/ptr_type_box.h: core/box/ss_os_acquire_box.h: +core/box/madvise_guard_box.h: core/hakmem_syscall.h: core/box/pagefault_telemetry_box.h: core/page_arena.h: diff --git a/hakmem_learner.d b/hakmem_learner.d index 2ba548ab..b99a3fad 100644 --- a/hakmem_learner.d +++ b/hakmem_learner.d @@ -9,7 +9,12 @@ hakmem_learner.o: core/hakmem_learner.c core/hakmem_learner.h \ core/superslab/superslab_inline.h core/superslab/superslab_types.h \ core/superslab/../tiny_box_geometry.h \ core/superslab/../hakmem_tiny_superslab_constants.h \ - core/superslab/../hakmem_tiny_config.h core/tiny_debug_ring.h \ + core/superslab/../hakmem_tiny_config.h \ + core/superslab/../hakmem_super_registry.h \ + core/superslab/../hakmem_tiny_superslab.h \ + core/superslab/../box/ss_addr_map_box.h \ + core/superslab/../box/../hakmem_build_flags.h \ + core/superslab/../box/super_reg_box.h core/tiny_debug_ring.h \ core/tiny_remote.h core/hakmem_tiny_superslab_constants.h \ core/box/learner_env_box.h core/box/../hakmem_config.h core/hakmem_learner.h: @@ -37,6 +42,11 @@ core/superslab/superslab_types.h: core/superslab/../tiny_box_geometry.h: core/superslab/../hakmem_tiny_superslab_constants.h: core/superslab/../hakmem_tiny_config.h: +core/superslab/../hakmem_super_registry.h: +core/superslab/../hakmem_tiny_superslab.h: +core/superslab/../box/ss_addr_map_box.h: +core/superslab/../box/../hakmem_build_flags.h: +core/superslab/../box/super_reg_box.h: core/tiny_debug_ring.h: core/tiny_remote.h: core/hakmem_tiny_superslab_constants.h: diff --git a/hakmem_pool.d b/hakmem_pool.d index 9858eef9..15e96d82 100644 --- a/hakmem_pool.d +++ b/hakmem_pool.d @@ -4,6 +4,7 @@ hakmem_pool.o: core/hakmem_pool.c core/hakmem_pool.h \ core/hakmem_build_flags.h core/hakmem_sys.h core/hakmem_whale.h \ core/box/ptr_type_box.h core/box/pool_hotbox_v2_header_box.h \ core/hakmem_syscall.h core/box/pool_hotbox_v2_box.h core/hakmem_pool.h \ + core/box/pool_zero_mode_box.h core/box/../hakmem_env_cache.h \ core/hakmem_prof.h core/hakmem_policy.h core/hakmem_debug.h \ core/box/pool_tls_types.inc.h core/box/pool_mid_desc.inc.h \ core/box/pool_mid_tc.inc.h core/box/pool_mf2_types.inc.h \ @@ -12,7 +13,7 @@ hakmem_pool.o: core/hakmem_pool.c core/hakmem_pool.h \ core/box/pool_init_api.inc.h core/box/pool_stats.inc.h \ core/box/pool_api.inc.h core/box/pagefault_telemetry_box.h \ core/box/pool_hotbox_v2_box.h core/box/tiny_heap_env_box.h \ - core/box/c7_hotpath_env_box.h + core/box/c7_hotpath_env_box.h core/box/pool_zero_mode_box.h core/hakmem_pool.h: core/box/hak_lane_classify.inc.h: core/hakmem_config.h: @@ -27,6 +28,8 @@ core/box/pool_hotbox_v2_header_box.h: core/hakmem_syscall.h: core/box/pool_hotbox_v2_box.h: core/hakmem_pool.h: +core/box/pool_zero_mode_box.h: +core/box/../hakmem_env_cache.h: core/hakmem_prof.h: core/hakmem_policy.h: core/hakmem_debug.h: @@ -45,3 +48,4 @@ core/box/pagefault_telemetry_box.h: core/box/pool_hotbox_v2_box.h: core/box/tiny_heap_env_box.h: core/box/c7_hotpath_env_box.h: +core/box/pool_zero_mode_box.h: diff --git a/hakmem_shared_pool.d b/hakmem_shared_pool.d index e2104bab..27845403 100644 --- a/hakmem_shared_pool.d +++ b/hakmem_shared_pool.d @@ -4,32 +4,36 @@ hakmem_shared_pool.o: core/hakmem_shared_pool.c \ core/hakmem_tiny_superslab.h core/superslab/superslab_inline.h \ core/superslab/superslab_types.h core/superslab/../tiny_box_geometry.h \ core/superslab/../hakmem_tiny_superslab_constants.h \ - core/superslab/../hakmem_tiny_config.h core/tiny_debug_ring.h \ + core/superslab/../hakmem_tiny_config.h \ + core/superslab/../hakmem_super_registry.h \ + core/superslab/../hakmem_tiny_superslab.h \ + core/superslab/../box/ss_addr_map_box.h \ + core/superslab/../box/../hakmem_build_flags.h \ + core/superslab/../box/super_reg_box.h core/tiny_debug_ring.h \ core/hakmem_build_flags.h core/tiny_remote.h \ core/hakmem_tiny_superslab_constants.h core/hakmem_debug_master.h \ core/hakmem_stats_master.h core/box/ss_slab_meta_box.h \ core/box/../superslab/superslab_types.h core/box/slab_freelist_atomic.h \ core/box/tiny_next_ptr_box.h core/hakmem_tiny_config.h \ core/tiny_nextptr.h core/tiny_region_id.h core/tiny_box_geometry.h \ - core/ptr_track.h core/hakmem_super_registry.h core/box/ss_addr_map_box.h \ - core/box/../hakmem_build_flags.h core/box/super_reg_box.h \ - core/hakmem_tiny.h core/hakmem_trace.h core/hakmem_tiny_mini_mag.h \ - core/box/hak_lane_classify.inc.h core/box/ptr_type_box.h \ - core/tiny_debug_api.h core/box/tiny_layout_box.h \ + core/ptr_track.h core/hakmem_tiny.h core/hakmem_trace.h \ + core/hakmem_tiny_mini_mag.h core/box/hak_lane_classify.inc.h \ + core/box/ptr_type_box.h core/tiny_debug_api.h core/box/tiny_layout_box.h \ core/box/../hakmem_tiny_config.h core/box/tiny_header_box.h \ - core/box/tiny_layout_box.h core/box/../tiny_region_id.h \ - core/box/ss_hot_cold_box.h core/box/pagefault_telemetry_box.h \ - core/box/tls_sll_drain_box.h core/box/tls_sll_box.h \ - core/box/../hakmem_internal.h core/box/../hakmem.h \ - core/box/../hakmem_build_flags.h core/box/../hakmem_config.h \ - core/box/../hakmem_features.h core/box/../hakmem_sys.h \ - core/box/../hakmem_whale.h core/box/../box/ptr_type_box.h \ - core/box/../hakmem_debug_master.h core/box/../tiny_remote.h \ - core/box/../hakmem_tiny_integrity.h core/box/../hakmem_tiny.h \ - core/box/../ptr_track.h core/box/../ptr_trace.h \ - core/box/../hakmem_trace_master.h core/box/../hakmem_stats_master.h \ - core/box/../tiny_debug_ring.h core/box/ss_addr_map_box.h \ - core/box/../superslab/superslab_inline.h core/box/tiny_ptr_bridge_box.h \ + core/box/../hakmem_build_flags.h core/box/tiny_layout_box.h \ + core/box/../tiny_region_id.h core/box/ss_hot_cold_box.h \ + core/box/pagefault_telemetry_box.h core/box/tls_sll_drain_box.h \ + core/box/tls_sll_box.h core/box/../hakmem_internal.h \ + core/box/../hakmem.h core/box/../hakmem_build_flags.h \ + core/box/../hakmem_config.h core/box/../hakmem_features.h \ + core/box/../hakmem_sys.h core/box/../hakmem_whale.h \ + core/box/../box/ptr_type_box.h core/box/../hakmem_debug_master.h \ + core/box/../tiny_remote.h core/box/../hakmem_tiny_integrity.h \ + core/box/../hakmem_tiny.h core/box/../ptr_track.h \ + core/box/../ptr_trace.h core/box/../hakmem_trace_master.h \ + core/box/../hakmem_stats_master.h core/box/../tiny_debug_ring.h \ + core/box/ss_addr_map_box.h core/box/../superslab/superslab_inline.h \ + core/box/tiny_ptr_bridge_box.h \ core/box/../hakmem_tiny_superslab_internal.h \ core/box/../hakmem_tiny_superslab.h core/box/../box/ss_hot_cold_box.h \ core/box/../box/ss_allocation_box.h core/hakmem_tiny_superslab.h \ @@ -54,6 +58,11 @@ core/superslab/superslab_types.h: core/superslab/../tiny_box_geometry.h: core/superslab/../hakmem_tiny_superslab_constants.h: core/superslab/../hakmem_tiny_config.h: +core/superslab/../hakmem_super_registry.h: +core/superslab/../hakmem_tiny_superslab.h: +core/superslab/../box/ss_addr_map_box.h: +core/superslab/../box/../hakmem_build_flags.h: +core/superslab/../box/super_reg_box.h: core/tiny_debug_ring.h: core/hakmem_build_flags.h: core/tiny_remote.h: @@ -69,10 +78,6 @@ core/tiny_nextptr.h: core/tiny_region_id.h: core/tiny_box_geometry.h: core/ptr_track.h: -core/hakmem_super_registry.h: -core/box/ss_addr_map_box.h: -core/box/../hakmem_build_flags.h: -core/box/super_reg_box.h: core/hakmem_tiny.h: core/hakmem_trace.h: core/hakmem_tiny_mini_mag.h: @@ -82,6 +87,7 @@ core/tiny_debug_api.h: core/box/tiny_layout_box.h: core/box/../hakmem_tiny_config.h: core/box/tiny_header_box.h: +core/box/../hakmem_build_flags.h: core/box/tiny_layout_box.h: core/box/../tiny_region_id.h: core/box/ss_hot_cold_box.h: diff --git a/hakmem_sys.d b/hakmem_sys.d index aeb0e49d..5ae03062 100644 --- a/hakmem_sys.d +++ b/hakmem_sys.d @@ -1,6 +1,8 @@ hakmem_sys.o: core/hakmem_sys.c core/hakmem_sys.h core/hakmem_debug.h \ - core/hakmem_env_cache.h core/box/ss_os_acquire_box.h + core/hakmem_env_cache.h core/box/ss_os_acquire_box.h \ + core/box/madvise_guard_box.h core/hakmem_sys.h: core/hakmem_debug.h: core/hakmem_env_cache.h: core/box/ss_os_acquire_box.h: +core/box/madvise_guard_box.h: diff --git a/hakmem_tiny_magazine.d b/hakmem_tiny_magazine.d index 7a3cacb6..8adea833 100644 --- a/hakmem_tiny_magazine.d +++ b/hakmem_tiny_magazine.d @@ -7,18 +7,22 @@ hakmem_tiny_magazine.o: core/hakmem_tiny_magazine.c \ core/superslab/superslab_inline.h core/superslab/superslab_types.h \ core/superslab/../tiny_box_geometry.h \ core/superslab/../hakmem_tiny_superslab_constants.h \ - core/superslab/../hakmem_tiny_config.h core/tiny_debug_ring.h \ + core/superslab/../hakmem_tiny_config.h \ + core/superslab/../hakmem_super_registry.h \ + core/superslab/../hakmem_tiny_superslab.h \ + core/superslab/../box/ss_addr_map_box.h \ + core/superslab/../box/../hakmem_build_flags.h \ + core/superslab/../box/super_reg_box.h core/tiny_debug_ring.h \ core/tiny_remote.h core/hakmem_tiny_superslab_constants.h \ - core/hakmem_super_registry.h core/box/ss_addr_map_box.h \ - core/box/../hakmem_build_flags.h core/box/super_reg_box.h \ core/hakmem_prof.h core/hakmem_internal.h core/hakmem.h \ core/hakmem_config.h core/hakmem_features.h core/hakmem_sys.h \ core/hakmem_whale.h core/box/tiny_next_ptr_box.h \ core/hakmem_tiny_config.h core/tiny_nextptr.h core/tiny_region_id.h \ core/tiny_box_geometry.h core/ptr_track.h core/tiny_debug_api.h \ core/box/tiny_layout_box.h core/box/../hakmem_tiny_config.h \ - core/box/tiny_header_box.h core/box/tiny_layout_box.h \ - core/box/../tiny_region_id.h core/box/tiny_mem_stats_box.h + core/box/tiny_header_box.h core/box/../hakmem_build_flags.h \ + core/box/tiny_layout_box.h core/box/../tiny_region_id.h \ + core/box/tiny_mem_stats_box.h core/hakmem_tiny_magazine.h: core/hakmem_tiny.h: core/hakmem_build_flags.h: @@ -35,13 +39,14 @@ core/superslab/superslab_types.h: core/superslab/../tiny_box_geometry.h: core/superslab/../hakmem_tiny_superslab_constants.h: core/superslab/../hakmem_tiny_config.h: +core/superslab/../hakmem_super_registry.h: +core/superslab/../hakmem_tiny_superslab.h: +core/superslab/../box/ss_addr_map_box.h: +core/superslab/../box/../hakmem_build_flags.h: +core/superslab/../box/super_reg_box.h: core/tiny_debug_ring.h: core/tiny_remote.h: core/hakmem_tiny_superslab_constants.h: -core/hakmem_super_registry.h: -core/box/ss_addr_map_box.h: -core/box/../hakmem_build_flags.h: -core/box/super_reg_box.h: core/hakmem_prof.h: core/hakmem_internal.h: core/hakmem.h: @@ -59,6 +64,7 @@ core/tiny_debug_api.h: core/box/tiny_layout_box.h: core/box/../hakmem_tiny_config.h: core/box/tiny_header_box.h: +core/box/../hakmem_build_flags.h: core/box/tiny_layout_box.h: core/box/../tiny_region_id.h: core/box/tiny_mem_stats_box.h: diff --git a/hakmem_tiny_query.d b/hakmem_tiny_query.d index 867c2047..01758b49 100644 --- a/hakmem_tiny_query.d +++ b/hakmem_tiny_query.d @@ -7,10 +7,13 @@ hakmem_tiny_query.o: core/hakmem_tiny_query.c core/hakmem_tiny.h \ core/superslab/superslab_inline.h core/superslab/superslab_types.h \ core/superslab/../tiny_box_geometry.h \ core/superslab/../hakmem_tiny_superslab_constants.h \ - core/superslab/../hakmem_tiny_config.h core/tiny_debug_ring.h \ + core/superslab/../hakmem_tiny_config.h \ + core/superslab/../hakmem_super_registry.h \ + core/superslab/../hakmem_tiny_superslab.h \ + core/superslab/../box/ss_addr_map_box.h \ + core/superslab/../box/../hakmem_build_flags.h \ + core/superslab/../box/super_reg_box.h core/tiny_debug_ring.h \ core/tiny_remote.h core/hakmem_tiny_superslab_constants.h \ - core/hakmem_super_registry.h core/box/ss_addr_map_box.h \ - core/box/../hakmem_build_flags.h core/box/super_reg_box.h \ core/hakmem_config.h core/hakmem_features.h core/hakmem_tiny.h: core/hakmem_build_flags.h: @@ -28,12 +31,13 @@ core/superslab/superslab_types.h: core/superslab/../tiny_box_geometry.h: core/superslab/../hakmem_tiny_superslab_constants.h: core/superslab/../hakmem_tiny_config.h: +core/superslab/../hakmem_super_registry.h: +core/superslab/../hakmem_tiny_superslab.h: +core/superslab/../box/ss_addr_map_box.h: +core/superslab/../box/../hakmem_build_flags.h: +core/superslab/../box/super_reg_box.h: core/tiny_debug_ring.h: core/tiny_remote.h: core/hakmem_tiny_superslab_constants.h: -core/hakmem_super_registry.h: -core/box/ss_addr_map_box.h: -core/box/../hakmem_build_flags.h: -core/box/super_reg_box.h: core/hakmem_config.h: core/hakmem_features.h: diff --git a/hakmem_tiny_stats.d b/hakmem_tiny_stats.d index 7d490a2b..bcaf4a80 100644 --- a/hakmem_tiny_stats.d +++ b/hakmem_tiny_stats.d @@ -7,7 +7,12 @@ hakmem_tiny_stats.o: core/hakmem_tiny_stats.c core/hakmem_tiny.h \ core/hakmem_tiny_superslab_constants.h core/superslab/superslab_inline.h \ core/superslab/superslab_types.h core/superslab/../tiny_box_geometry.h \ core/superslab/../hakmem_tiny_superslab_constants.h \ - core/superslab/../hakmem_tiny_config.h core/tiny_debug_ring.h \ + core/superslab/../hakmem_tiny_config.h \ + core/superslab/../hakmem_super_registry.h \ + core/superslab/../hakmem_tiny_superslab.h \ + core/superslab/../box/ss_addr_map_box.h \ + core/superslab/../box/../hakmem_build_flags.h \ + core/superslab/../box/super_reg_box.h core/tiny_debug_ring.h \ core/tiny_remote.h core/hakmem_tiny_superslab_constants.h \ core/hakmem_config.h core/hakmem_features.h core/hakmem_tiny_stats.h core/hakmem_tiny.h: @@ -27,6 +32,11 @@ core/superslab/superslab_types.h: core/superslab/../tiny_box_geometry.h: core/superslab/../hakmem_tiny_superslab_constants.h: core/superslab/../hakmem_tiny_config.h: +core/superslab/../hakmem_super_registry.h: +core/superslab/../hakmem_tiny_superslab.h: +core/superslab/../box/ss_addr_map_box.h: +core/superslab/../box/../hakmem_build_flags.h: +core/superslab/../box/super_reg_box.h: core/tiny_debug_ring.h: core/tiny_remote.h: core/hakmem_tiny_superslab_constants.h: diff --git a/perf.data.mid_zero b/perf.data.mid_zero new file mode 100644 index 00000000..3640d980 Binary files /dev/null and b/perf.data.mid_zero differ diff --git a/perf.data.tiny_front_tf3 b/perf.data.tiny_front_tf3 new file mode 100644 index 00000000..ec0cc1b2 Binary files /dev/null and b/perf.data.tiny_front_tf3 differ diff --git a/perf.data.tiny_front_tf3_on b/perf.data.tiny_front_tf3_on new file mode 100644 index 00000000..c8b85c4b Binary files /dev/null and b/perf.data.tiny_front_tf3_on differ diff --git a/tests/bin/libm_reloc_guard_test b/tests/bin/libm_reloc_guard_test new file mode 100755 index 00000000..76c4af7e Binary files /dev/null and b/tests/bin/libm_reloc_guard_test differ diff --git a/tests/bin/madvise_guard_test b/tests/bin/madvise_guard_test new file mode 100755 index 00000000..119c2205 Binary files /dev/null and b/tests/bin/madvise_guard_test differ diff --git a/tests/unit/libm_reloc_guard_test.c b/tests/unit/libm_reloc_guard_test.c new file mode 100644 index 00000000..6fac597f --- /dev/null +++ b/tests/unit/libm_reloc_guard_test.c @@ -0,0 +1,14 @@ +// tests/unit/libm_reloc_guard_test.c - Box-level test for libm relocation guard +#include +#include + +#include "box/libm_reloc_guard_box.h" + +int main(void) { + // Guard should honor env disable before first evaluation. + setenv("HAKMEM_LIBM_RELOC_GUARD", "0", 1); + assert(libm_reloc_guard_enabled() == 0); + // Should no-op safely even when called. + libm_reloc_guard_run(); + return 0; +} diff --git a/tests/unit/madvise_guard_test.c b/tests/unit/madvise_guard_test.c new file mode 100644 index 00000000..69f529e4 --- /dev/null +++ b/tests/unit/madvise_guard_test.c @@ -0,0 +1,66 @@ +// tests/unit/madvise_guard_test.c - Box-level tests for madvise guard +#include +#include +#include +#include +#include +#include +#include + +#include "box/madvise_guard_box.h" +#include "box/ss_os_acquire_box.h" + +// Provide counter definitions for the guard (standalone unit test). +_Atomic uint64_t g_ss_mmap_count = 0; +_Atomic uint64_t g_final_fallback_mmap_count = 0; +_Atomic uint64_t g_ss_os_alloc_calls = 0; +_Atomic uint64_t g_ss_os_free_calls = 0; +_Atomic uint64_t g_ss_os_madvise_calls = 0; +_Atomic uint64_t g_ss_os_madvise_fail_enomem = 0; +_Atomic uint64_t g_ss_os_madvise_fail_other = 0; +_Atomic uint64_t g_ss_os_huge_alloc_calls = 0; +_Atomic uint64_t g_ss_os_huge_fail_calls = 0; +_Atomic bool g_ss_madvise_disabled = false; + +static void reset_counters(void) { + atomic_store(&g_ss_os_madvise_calls, 0); + atomic_store(&g_ss_os_madvise_fail_enomem, 0); + atomic_store(&g_ss_os_madvise_fail_other, 0); + atomic_store(&g_ss_madvise_disabled, false); +} + +static void test_dso_pointer_is_skipped(void) { + reset_counters(); + int ret = ss_os_madvise_guarded((void*)&ss_os_madvise_guarded, 4096, MADV_DONTNEED, "dso_skip"); + if (ret != 0) { + fprintf(stderr, "madvise_guard returned %d for DSO pointer\n", ret); + exit(1); + } + assert(atomic_load(&g_ss_os_madvise_calls) == 0); + assert(!atomic_load(&g_ss_madvise_disabled)); +} + +static void test_anonymous_region_makes_syscall(void) { + reset_counters(); + long page = sysconf(_SC_PAGESIZE); + if (page <= 0) page = 4096; + + void* mem = mmap(NULL, (size_t)page, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + assert(mem && mem != MAP_FAILED); + + int ret = ss_os_madvise_guarded(mem, (size_t)page, MADV_DONTNEED, "anon_region"); + if (ret != 0) { + fprintf(stderr, "madvise_guard returned %d for anon region\n", ret); + exit(1); + } + assert(atomic_load(&g_ss_os_madvise_calls) == 1); + + munmap(mem, (size_t)page); +} + +int main(void) { + test_dso_pointer_is_skipped(); + test_anonymous_region_makes_syscall(); + return 0; +} diff --git a/tiny_publish.d b/tiny_publish.d index 4d57e248..126a6ca9 100644 --- a/tiny_publish.d +++ b/tiny_publish.d @@ -6,10 +6,15 @@ tiny_publish.o: core/tiny_publish.c core/hakmem_tiny.h \ core/hakmem_tiny_superslab_constants.h core/superslab/superslab_inline.h \ core/superslab/superslab_types.h core/superslab/../tiny_box_geometry.h \ core/superslab/../hakmem_tiny_superslab_constants.h \ - core/superslab/../hakmem_tiny_config.h core/tiny_debug_ring.h \ + core/superslab/../hakmem_tiny_config.h \ + core/superslab/../hakmem_super_registry.h \ + core/superslab/../hakmem_tiny_superslab.h \ + core/superslab/../box/ss_addr_map_box.h \ + core/superslab/../box/../hakmem_build_flags.h \ + core/superslab/../box/super_reg_box.h core/tiny_debug_ring.h \ core/tiny_remote.h core/hakmem_tiny_superslab_constants.h \ core/tiny_publish.h core/hakmem_tiny_superslab.h \ - core/hakmem_tiny_stats_api.h + core/hakmem_tiny_stats_api.h core/hakmem_trace_master.h core/hakmem_tiny.h: core/hakmem_build_flags.h: core/hakmem_trace.h: @@ -25,9 +30,15 @@ core/superslab/superslab_types.h: core/superslab/../tiny_box_geometry.h: core/superslab/../hakmem_tiny_superslab_constants.h: core/superslab/../hakmem_tiny_config.h: +core/superslab/../hakmem_super_registry.h: +core/superslab/../hakmem_tiny_superslab.h: +core/superslab/../box/ss_addr_map_box.h: +core/superslab/../box/../hakmem_build_flags.h: +core/superslab/../box/super_reg_box.h: core/tiny_debug_ring.h: core/tiny_remote.h: core/hakmem_tiny_superslab_constants.h: core/tiny_publish.h: core/hakmem_tiny_superslab.h: core/hakmem_tiny_stats_api.h: +core/hakmem_trace_master.h: diff --git a/tiny_remote.d b/tiny_remote.d index bfc7d22f..bf89846c 100644 --- a/tiny_remote.d +++ b/tiny_remote.d @@ -4,7 +4,12 @@ tiny_remote.o: core/tiny_remote.c core/tiny_remote.h \ core/superslab/superslab_inline.h core/superslab/superslab_types.h \ core/superslab/../tiny_box_geometry.h \ core/superslab/../hakmem_tiny_superslab_constants.h \ - core/superslab/../hakmem_tiny_config.h core/tiny_debug_ring.h \ + core/superslab/../hakmem_tiny_config.h \ + core/superslab/../hakmem_super_registry.h \ + core/superslab/../hakmem_tiny_superslab.h \ + core/superslab/../box/ss_addr_map_box.h \ + core/superslab/../box/../hakmem_build_flags.h \ + core/superslab/../box/super_reg_box.h core/tiny_debug_ring.h \ core/hakmem_build_flags.h core/hakmem_tiny_superslab_constants.h core/tiny_remote.h: core/box/remote_side_box.h: @@ -16,6 +21,11 @@ core/superslab/superslab_types.h: core/superslab/../tiny_box_geometry.h: core/superslab/../hakmem_tiny_superslab_constants.h: core/superslab/../hakmem_tiny_config.h: +core/superslab/../hakmem_super_registry.h: +core/superslab/../hakmem_tiny_superslab.h: +core/superslab/../box/ss_addr_map_box.h: +core/superslab/../box/../hakmem_build_flags.h: +core/superslab/../box/super_reg_box.h: core/tiny_debug_ring.h: core/hakmem_build_flags.h: core/hakmem_tiny_superslab_constants.h: diff --git a/tiny_sticky.d b/tiny_sticky.d index b9daf103..b77189f9 100644 --- a/tiny_sticky.d +++ b/tiny_sticky.d @@ -6,7 +6,12 @@ tiny_sticky.o: core/tiny_sticky.c core/hakmem_tiny.h \ core/superslab/superslab_inline.h core/superslab/superslab_types.h \ core/superslab/../tiny_box_geometry.h \ core/superslab/../hakmem_tiny_superslab_constants.h \ - core/superslab/../hakmem_tiny_config.h core/tiny_debug_ring.h \ + core/superslab/../hakmem_tiny_config.h \ + core/superslab/../hakmem_super_registry.h \ + core/superslab/../hakmem_tiny_superslab.h \ + core/superslab/../box/ss_addr_map_box.h \ + core/superslab/../box/../hakmem_build_flags.h \ + core/superslab/../box/super_reg_box.h core/tiny_debug_ring.h \ core/tiny_remote.h core/hakmem_tiny_superslab_constants.h core/hakmem_tiny.h: core/hakmem_build_flags.h: @@ -23,6 +28,11 @@ core/superslab/superslab_types.h: core/superslab/../tiny_box_geometry.h: core/superslab/../hakmem_tiny_superslab_constants.h: core/superslab/../hakmem_tiny_config.h: +core/superslab/../hakmem_super_registry.h: +core/superslab/../hakmem_tiny_superslab.h: +core/superslab/../box/ss_addr_map_box.h: +core/superslab/../box/../hakmem_build_flags.h: +core/superslab/../box/super_reg_box.h: core/tiny_debug_ring.h: core/tiny_remote.h: core/hakmem_tiny_superslab_constants.h: