Phase 86: Free Path Legacy Mask (NO-GO, +0.25%)
## Summary Implemented Phase 86 "mask-only commit" optimization for free path: - Bitset mask (0x7f for C0-C6) to identify LEGACY classes - Direct call to tiny_legacy_fallback_free_base_with_env() - No indirect function pointers (avoids Phase 85's -0.86% regression) - Fail-fast on LARSON_FIX=1 (cross-thread validation incompatibility) ## Results (10-run SSOT) **NO-GO**: +0.25% improvement (threshold: +1.0%) - Control: 51,750,467 ops/s (CV: 2.26%) - Treatment: 51,881,055 ops/s (CV: 2.32%) - Delta: +0.25% (mean), -0.15% (median) ## Root Cause Competing optimizations plateau: 1. Phase 9/10 MONO LEGACY (+1.89%) already capture most free path benefit 2. Remaining margin insufficient to overcome: - Two branch checks (mask_enabled + has_class) - I-cache layout tax in hot path - Direct function call overhead ## Phase 85 vs Phase 86 | Metric | Phase 85 | Phase 86 | |--------|----------|----------| | Approach | Indirect calls + table | Bitset mask + direct call | | Result | -0.86% | +0.25% | | Verdict | NO-GO (regression) | NO-GO (insufficient) | Phase 86 correctly avoided indirect call penalties but revealed architectural limit: can't escape Phase 9/10 overlay without restructuring. ## Recommendation Free path optimization layer has reached practical ceiling: - Phase 9/10 +1.89% + Phase 6/19/FASTLANE +16-27% ≈ 18-29% total - Further attempts on ceremony elimination face same constraints - Recommend focus on different optimization layers (malloc, etc.) ## Files Changed ### New - core/box/free_path_legacy_mask_box.h (API + globals) - core/box/free_path_legacy_mask_box.c (refresh logic) ### Modified - core/bench_profile.h (added refresh call) - core/front/malloc_tiny_fast.h (added Phase 86 fast path check) - Makefile (added object files) - CURRENT_TASK.md (documented result) All changes conditional on HAKMEM_FREE_PATH_LEGACY_MASK=1 (default OFF). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
@ -1,5 +1,22 @@
|
|||||||
# CURRENT_TASK(Rolling, SSOT)
|
# CURRENT_TASK(Rolling, SSOT)
|
||||||
|
|
||||||
|
## Phase 86(終了: NO-GO)
|
||||||
|
|
||||||
|
**Status**: ❌ NO-GO (+0.25% improvement, threshold: +1.0%)
|
||||||
|
|
||||||
|
**A/B Test (10-run SSOT)**:
|
||||||
|
- Control: 51,750,467 ops/s (CV: 2.26%)
|
||||||
|
- Treatment: 51,881,055 ops/s (CV: 2.32%)
|
||||||
|
- Delta: +0.25% (mean), -0.15% (median)
|
||||||
|
|
||||||
|
**Summary**: Free path legacy mask (mask-only) optimization for LEGACY classes.
|
||||||
|
- Design: Bitset mask + direct call (avoids Phase 85's indirect call problems)
|
||||||
|
- Implementation: Correct (0x7f mask computed, C0-C6 optimized)
|
||||||
|
- Root cause: Competing Phase 9/10 optimizations (+1.89%) already capture most benefit
|
||||||
|
- Conclusion: Free path optimization layer has reached practical ceiling
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## 0) 今の「正」(SSOT)
|
## 0) 今の「正」(SSOT)
|
||||||
|
|
||||||
- **性能比較の正**: FAST PGO build(`make pgo-fast-full` → `bench_random_mixed_hakmem_minimal_pgo`)+ **WarmPool=16**
|
- **性能比較の正**: FAST PGO build(`make pgo-fast-full` → `bench_random_mixed_hakmem_minimal_pgo`)+ **WarmPool=16**
|
||||||
@ -29,6 +46,7 @@
|
|||||||
- 再現ログを残す(数%を詰めるときの最低限):
|
- 再現ログを残す(数%を詰めるときの最低限):
|
||||||
- `scripts/bench_ssot_capture.sh`
|
- `scripts/bench_ssot_capture.sh`
|
||||||
- `HAKMEM_BENCH_ENV_LOG=1`(CPU governor/EPP/freq を記録)
|
- `HAKMEM_BENCH_ENV_LOG=1`(CPU governor/EPP/freq を記録)
|
||||||
|
- 外部相談(貼り付けパケット): `docs/analysis/FREE_PATH_REVIEW_PACKET_CHATGPT.md`(生成: `scripts/make_chatgpt_pro_packet_free_path.sh`)
|
||||||
|
|
||||||
## 0b) Allocator比較(reference)
|
## 0b) Allocator比較(reference)
|
||||||
|
|
||||||
@ -87,6 +105,11 @@
|
|||||||
- **Phase 82(hardening)**: hot path から C2 local cache を完全除外(環境変数を立てても alloc/free hot では踏まない)
|
- **Phase 82(hardening)**: hot path から C2 local cache を完全除外(環境変数を立てても alloc/free hot では踏まない)
|
||||||
- 記録: `docs/analysis/PHASE82_C2_LOCAL_CACHE_HOTPATH_EXCLUSION.md`
|
- 記録: `docs/analysis/PHASE82_C2_LOCAL_CACHE_HOTPATH_EXCLUSION.md`
|
||||||
|
|
||||||
|
- **Phase 85(Free path commit-once, LEGACY-only)**: `HAKMEM_FREE_PATH_COMMIT_ONCE=0/1`
|
||||||
|
- 結果: **NO-GO(-0.86%)** → **research box freeze(default OFF)**
|
||||||
|
- 理由: Phase 10(MONO LEGACY DIRECT)と効果が被り、さらに間接呼び出し/配置の税が増えた
|
||||||
|
- 記録: `docs/analysis/PHASE85_FREE_PATH_COMMIT_ONCE_RESULTS.md`
|
||||||
|
|
||||||
## 4) 次の指示書(Active)
|
## 4) 次の指示書(Active)
|
||||||
|
|
||||||
### Phase 74(構造): UnifiedCache hit-path を短くする ✅ **P1 (LOCALIZE) 凍結**
|
### Phase 74(構造): UnifiedCache hit-path を短くする ✅ **P1 (LOCALIZE) 凍結**
|
||||||
|
|||||||
6
Makefile
6
Makefile
@ -253,7 +253,7 @@ LDFLAGS += $(EXTRA_LDFLAGS)
|
|||||||
|
|
||||||
# Targets
|
# Targets
|
||||||
TARGET = test_hakmem
|
TARGET = test_hakmem
|
||||||
OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o core/box/ss_release_policy_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/box/tiny_tcache_env_box.o core/box/tiny_unified_lifo_env_box.o core/box/front_fastlane_alloc_legacy_direct_env_box.o core/box/fastlane_direct_env_box.o core/box/tiny_header_hotfull_env_box.o core/box/tiny_inline_slots_fixed_mode_box.o core/box/tiny_inline_slots_switch_dispatch_fixed_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/tiny_c6_inline_slots.o core/tiny_c5_inline_slots.o core/tiny_c2_local_cache.o core/tiny_c3_inline_slots.o core/tiny_c4_inline_slots.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o core/box/small_policy_snapshot_tls_box.o
|
OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o core/box/ss_release_policy_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/box/tiny_tcache_env_box.o core/box/tiny_unified_lifo_env_box.o core/box/front_fastlane_alloc_legacy_direct_env_box.o core/box/fastlane_direct_env_box.o core/box/tiny_header_hotfull_env_box.o core/box/tiny_inline_slots_fixed_mode_box.o core/box/tiny_inline_slots_switch_dispatch_fixed_box.o core/box/free_path_commit_once_fixed_box.o core/box/free_path_legacy_mask_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/tiny_c6_inline_slots.o core/tiny_c5_inline_slots.o core/tiny_c2_local_cache.o core/tiny_c3_inline_slots.o core/tiny_c4_inline_slots.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o core/box/small_policy_snapshot_tls_box.o
|
||||||
OBJS = $(OBJS_BASE)
|
OBJS = $(OBJS_BASE)
|
||||||
|
|
||||||
# Shared library
|
# Shared library
|
||||||
@ -287,7 +287,7 @@ endif
|
|||||||
# Benchmark targets
|
# Benchmark targets
|
||||||
BENCH_HAKMEM = bench_allocators_hakmem
|
BENCH_HAKMEM = bench_allocators_hakmem
|
||||||
BENCH_SYSTEM = bench_allocators_system
|
BENCH_SYSTEM = bench_allocators_system
|
||||||
BENCH_HAKMEM_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o core/box/ss_release_policy_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/fastlane_direct_env_box.o core/box/tiny_inline_slots_fixed_mode_box.o core/box/tiny_inline_slots_switch_dispatch_fixed_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/tiny_c6_inline_slots.o core/tiny_c5_inline_slots.o core/tiny_c2_local_cache.o core/tiny_c3_inline_slots.o core/tiny_c4_inline_slots.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o core/box/small_policy_snapshot_tls_box.o bench_allocators_hakmem.o
|
BENCH_HAKMEM_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o core/box/ss_release_policy_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/fastlane_direct_env_box.o core/box/tiny_inline_slots_fixed_mode_box.o core/box/tiny_inline_slots_switch_dispatch_fixed_box.o core/box/free_path_commit_once_fixed_box.o core/box/free_path_legacy_mask_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/tiny_c6_inline_slots.o core/tiny_c5_inline_slots.o core/tiny_c2_local_cache.o core/tiny_c3_inline_slots.o core/tiny_c4_inline_slots.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o core/box/small_policy_snapshot_tls_box.o bench_allocators_hakmem.o
|
||||||
BENCH_HAKMEM_OBJS = $(BENCH_HAKMEM_OBJS_BASE)
|
BENCH_HAKMEM_OBJS = $(BENCH_HAKMEM_OBJS_BASE)
|
||||||
ifeq ($(POOL_TLS_PHASE1),1)
|
ifeq ($(POOL_TLS_PHASE1),1)
|
||||||
BENCH_HAKMEM_OBJS += pool_tls.o pool_refill.o pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o
|
BENCH_HAKMEM_OBJS += pool_tls.o pool_refill.o pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o
|
||||||
@ -464,7 +464,7 @@ test-box-refactor: box-refactor
|
|||||||
./larson_hakmem 10 8 128 1024 1 12345 4
|
./larson_hakmem 10 8 128 1024 1 12345 4
|
||||||
|
|
||||||
# Phase 4: Tiny Pool benchmarks (properly linked with hakmem)
|
# Phase 4: Tiny Pool benchmarks (properly linked with hakmem)
|
||||||
TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o core/box/ss_release_policy_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/box/tiny_tcache_env_box.o core/box/tiny_unified_lifo_env_box.o core/box/front_fastlane_alloc_legacy_direct_env_box.o core/box/fastlane_direct_env_box.o core/box/tiny_header_hotfull_env_box.o core/box/tiny_inline_slots_fixed_mode_box.o core/box/tiny_inline_slots_switch_dispatch_fixed_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/tiny_c6_inline_slots.o core/tiny_c5_inline_slots.o core/tiny_c2_local_cache.o core/tiny_c3_inline_slots.o core/tiny_c4_inline_slots.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o core/box/small_policy_snapshot_tls_box.o
|
TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o core/box/ss_release_policy_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/box/tiny_tcache_env_box.o core/box/tiny_unified_lifo_env_box.o core/box/front_fastlane_alloc_legacy_direct_env_box.o core/box/fastlane_direct_env_box.o core/box/tiny_header_hotfull_env_box.o core/box/tiny_inline_slots_fixed_mode_box.o core/box/tiny_inline_slots_switch_dispatch_fixed_box.o core/box/free_path_commit_once_fixed_box.o core/box/free_path_legacy_mask_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/tiny_c6_inline_slots.o core/tiny_c5_inline_slots.o core/tiny_c2_local_cache.o core/tiny_c3_inline_slots.o core/tiny_c4_inline_slots.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o core/box/small_policy_snapshot_tls_box.o
|
||||||
TINY_BENCH_OBJS = $(TINY_BENCH_OBJS_BASE)
|
TINY_BENCH_OBJS = $(TINY_BENCH_OBJS_BASE)
|
||||||
ifeq ($(POOL_TLS_PHASE1),1)
|
ifeq ($(POOL_TLS_PHASE1),1)
|
||||||
TINY_BENCH_OBJS += pool_tls.o pool_refill.o core/pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o
|
TINY_BENCH_OBJS += pool_tls.o pool_refill.o core/pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o
|
||||||
|
|||||||
@ -17,6 +17,8 @@
|
|||||||
#include "box/fastlane_direct_env_box.h" // fastlane_direct_env_refresh_from_env (Phase 19-1)
|
#include "box/fastlane_direct_env_box.h" // fastlane_direct_env_refresh_from_env (Phase 19-1)
|
||||||
#include "box/tiny_header_hotfull_env_box.h" // tiny_header_hotfull_env_refresh_from_env (Phase 21)
|
#include "box/tiny_header_hotfull_env_box.h" // tiny_header_hotfull_env_refresh_from_env (Phase 21)
|
||||||
#include "box/tiny_inline_slots_fixed_mode_box.h" // tiny_inline_slots_fixed_mode_refresh_from_env (Phase 78-1)
|
#include "box/tiny_inline_slots_fixed_mode_box.h" // tiny_inline_slots_fixed_mode_refresh_from_env (Phase 78-1)
|
||||||
|
#include "box/free_path_commit_once_fixed_box.h" // free_path_commit_once_refresh_from_env (Phase 85)
|
||||||
|
#include "box/free_path_legacy_mask_box.h" // free_path_legacy_mask_refresh_from_env (Phase 86)
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
// env が未設定のときだけ既定値を入れる
|
// env が未設定のときだけ既定値を入れる
|
||||||
@ -235,5 +237,9 @@ static inline void bench_apply_profile(void) {
|
|||||||
tiny_header_hotfull_env_refresh_from_env();
|
tiny_header_hotfull_env_refresh_from_env();
|
||||||
// Phase 78-1: Optionally pin C3/C4/C5/C6 inline-slots modes (avoid per-op ENV gates).
|
// Phase 78-1: Optionally pin C3/C4/C5/C6 inline-slots modes (avoid per-op ENV gates).
|
||||||
tiny_inline_slots_fixed_mode_refresh_from_env();
|
tiny_inline_slots_fixed_mode_refresh_from_env();
|
||||||
|
// Phase 85: Optionally commit-once for C4-C7 LEGACY free path (skip policy/route/mono ceremony).
|
||||||
|
free_path_commit_once_refresh_from_env();
|
||||||
|
// Phase 86: Optionally use legacy mask for early exit (no indirect calls, just bit test).
|
||||||
|
free_path_legacy_mask_refresh_from_env();
|
||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
|
|||||||
105
core/box/free_path_commit_once_fixed_box.c
Normal file
105
core/box/free_path_commit_once_fixed_box.c
Normal file
@ -0,0 +1,105 @@
|
|||||||
|
// free_path_commit_once_fixed_box.c - Phase 85: Free Path Commit-Once (LEGACY-only)
|
||||||
|
|
||||||
|
#include "free_path_commit_once_fixed_box.h"
|
||||||
|
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <stdio.h>
|
||||||
|
#include "tiny_route_env_box.h"
|
||||||
|
#include "free_policy_fast_v2_box.h"
|
||||||
|
#include "tiny_legacy_fallback_box.h"
|
||||||
|
#include "hakmem_build_flags.h"
|
||||||
|
|
||||||
|
#define TINY_C4 4
|
||||||
|
#define TINY_C7 7
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Global state
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
uint8_t g_free_path_commit_once_enabled = 0;
|
||||||
|
struct FreePatchCommitOnceEntry g_free_path_commit_once_entries[4] = {0};
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Refresh from ENV (called by bench_profile)
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
void free_path_commit_once_refresh_from_env(void) {
|
||||||
|
// 1. Read master ENV gate
|
||||||
|
const char* env_val = getenv("HAKMEM_FREE_PATH_COMMIT_ONCE");
|
||||||
|
int requested = (env_val && *env_val && *env_val != '0') ? 1 : 0;
|
||||||
|
|
||||||
|
if (!requested) {
|
||||||
|
g_free_path_commit_once_enabled = 0;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// 2. Fail-fast: LARSON_FIX incompatible with commit-once
|
||||||
|
// owner_tid validation must happen on every free, cannot commit-once
|
||||||
|
const char* larson_env = getenv("HAKMEM_TINY_LARSON_FIX");
|
||||||
|
int larson_fix_enabled = (larson_env && *larson_env && *larson_env != '0') ? 1 : 0;
|
||||||
|
|
||||||
|
if (larson_fix_enabled) {
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
fprintf(stderr, "[FREE_PATH_COMMIT_ONCE] FAIL-FAST: HAKMEM_TINY_LARSON_FIX=1 incompatible, disabling\n");
|
||||||
|
fflush(stderr);
|
||||||
|
#endif
|
||||||
|
g_free_path_commit_once_enabled = 0;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// 3. Ensure route snapshot is initialized
|
||||||
|
tiny_route_snapshot_init();
|
||||||
|
|
||||||
|
// 4. Get nonlegacy mask (classes that use ULTRA/MID/V7)
|
||||||
|
uint8_t nonlegacy_mask = free_policy_fast_v2_nonlegacy_mask();
|
||||||
|
|
||||||
|
// 5. For each C4-C7 class, determine if it can commit-once
|
||||||
|
// Commit-once is safe if:
|
||||||
|
// - Class is NOT in nonlegacy_mask (implies LEGACY route)
|
||||||
|
// - Route snapshot confirms TINY_ROUTE_LEGACY
|
||||||
|
for (int i = 0; i < 4; i++) {
|
||||||
|
unsigned class_idx = TINY_C4 + i;
|
||||||
|
struct FreePatchCommitOnceEntry* entry = &g_free_path_commit_once_entries[i];
|
||||||
|
|
||||||
|
// Initialize entry
|
||||||
|
entry->can_commit = 0;
|
||||||
|
entry->handler = NULL;
|
||||||
|
|
||||||
|
// Check if class is in nonlegacy mask
|
||||||
|
if ((nonlegacy_mask & (1u << class_idx)) != 0) {
|
||||||
|
// Class uses non-legacy path (ULTRA/MID/V7)
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check route snapshot
|
||||||
|
tiny_route_kind_t route = tiny_route_for_class((uint8_t)class_idx);
|
||||||
|
if (route != TINY_ROUTE_LEGACY) {
|
||||||
|
// Unexpected route (should not happen if nonlegacy_mask is correct)
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
fprintf(stderr, "[FREE_PATH_COMMIT_ONCE] FAIL-FAST: C%u route=%d not LEGACY, disabling\n",
|
||||||
|
class_idx, (int)route);
|
||||||
|
fflush(stderr);
|
||||||
|
#endif
|
||||||
|
g_free_path_commit_once_enabled = 0;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Route is LEGACY and class not in nonlegacy_mask: safe to commit-once
|
||||||
|
entry->can_commit = 1;
|
||||||
|
entry->handler = tiny_legacy_fallback_free_base_with_env;
|
||||||
|
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
fprintf(stderr, "[FREE_PATH_COMMIT_ONCE] C%u committed (handler=%p)\n",
|
||||||
|
class_idx, (void*)entry->handler);
|
||||||
|
fflush(stderr);
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
// 6. All checks passed, enable commit-once
|
||||||
|
g_free_path_commit_once_enabled = 1;
|
||||||
|
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
fprintf(stderr, "[FREE_PATH_COMMIT_ONCE] Enabled (nonlegacy_mask=0x%02x, LARSON_FIX=0)\n", nonlegacy_mask);
|
||||||
|
fflush(stderr);
|
||||||
|
#endif
|
||||||
|
}
|
||||||
49
core/box/free_path_commit_once_fixed_box.h
Normal file
49
core/box/free_path_commit_once_fixed_box.h
Normal file
@ -0,0 +1,49 @@
|
|||||||
|
// free_path_commit_once_fixed_box.h - Phase 85: Free Path Commit-Once (LEGACY-only)
|
||||||
|
//
|
||||||
|
// Goal: Eliminate per-operation policy/route/mono ceremony overhead for C4-C7 LEGACY classes
|
||||||
|
// by pre-computing route+handler at init-time.
|
||||||
|
//
|
||||||
|
// Design (Box Theory, adapted from Phase 78-1):
|
||||||
|
// - Single boundary: bench_profile calls free_path_commit_once_refresh_from_env()
|
||||||
|
// after applying presets.
|
||||||
|
// - Cache: Pre-compute for each C4-C7 class whether it can use commit-once path
|
||||||
|
// (must be LEGACY route AND LARSON_FIX disabled)
|
||||||
|
// - Hot path: If commit-once enabled and class in commit set, skip Phase 9/10/policy/route
|
||||||
|
// ceremony and call handler directly.
|
||||||
|
// - Reversible: toggle HAKMEM_FREE_PATH_COMMIT_ONCE=0/1.
|
||||||
|
//
|
||||||
|
// Fail-fast: If HAKMEM_TINY_LARSON_FIX=1, disable commit-once (owner_tid validation
|
||||||
|
// incompatible with early exit).
|
||||||
|
//
|
||||||
|
// ENV:
|
||||||
|
// - HAKMEM_FREE_PATH_COMMIT_ONCE=0/1 (default 0)
|
||||||
|
|
||||||
|
#ifndef HAK_BOX_FREE_PATH_COMMIT_ONCE_FIXED_BOX_H
|
||||||
|
#define HAK_BOX_FREE_PATH_COMMIT_ONCE_FIXED_BOX_H
|
||||||
|
|
||||||
|
#include <stdint.h>
|
||||||
|
#include "tiny_route_env_box.h"
|
||||||
|
|
||||||
|
// Forward declaration: handler function pointer
|
||||||
|
typedef void (*FreeTinyHandler)(void* base, uint32_t class_idx, const struct HakmemEnvSnapshot* env);
|
||||||
|
|
||||||
|
// Cached entry for a single class (C4-C7)
|
||||||
|
struct FreePatchCommitOnceEntry {
|
||||||
|
uint8_t can_commit; // 1 if this class can use commit-once, 0 otherwise
|
||||||
|
FreeTinyHandler handler; // Handler function pointer (if can_commit=1)
|
||||||
|
};
|
||||||
|
|
||||||
|
// Refresh (single boundary): bench_profile calls this after putenv defaults.
|
||||||
|
void free_path_commit_once_refresh_from_env(void);
|
||||||
|
|
||||||
|
// Cached state (read in hot path).
|
||||||
|
extern uint8_t g_free_path_commit_once_enabled;
|
||||||
|
extern struct FreePatchCommitOnceEntry g_free_path_commit_once_entries[4]; // C4-C7
|
||||||
|
|
||||||
|
// Fast-path API (inlined)
|
||||||
|
__attribute__((always_inline))
|
||||||
|
static inline int free_path_commit_once_enabled_fast(void) {
|
||||||
|
return (int)g_free_path_commit_once_enabled;
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif // HAK_BOX_FREE_PATH_COMMIT_ONCE_FIXED_BOX_H
|
||||||
88
core/box/free_path_legacy_mask_box.c
Normal file
88
core/box/free_path_legacy_mask_box.c
Normal file
@ -0,0 +1,88 @@
|
|||||||
|
// free_path_legacy_mask_box.c - Phase 86: Free Path Legacy Mask (mask-only)
|
||||||
|
|
||||||
|
#include "free_path_legacy_mask_box.h"
|
||||||
|
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <stdio.h>
|
||||||
|
#include "tiny_route_env_box.h"
|
||||||
|
#include "free_policy_fast_v2_box.h"
|
||||||
|
#include "tiny_c7_ultra_box.h"
|
||||||
|
#include "hakmem_build_flags.h"
|
||||||
|
|
||||||
|
#define TINY_C0 0
|
||||||
|
#define TINY_C7 7
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Global state
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
uint8_t g_free_legacy_mask_enabled = 0;
|
||||||
|
uint8_t g_free_legacy_mask = 0;
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Refresh from ENV (called by bench_profile)
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
void free_path_legacy_mask_refresh_from_env(void) {
|
||||||
|
// 1. Read master ENV gate
|
||||||
|
const char* env_val = getenv("HAKMEM_FREE_PATH_LEGACY_MASK");
|
||||||
|
int requested = (env_val && *env_val && *env_val != '0') ? 1 : 0;
|
||||||
|
|
||||||
|
if (!requested) {
|
||||||
|
g_free_legacy_mask_enabled = 0;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// 2. Fail-fast: LARSON_FIX incompatible
|
||||||
|
// owner_tid validation must happen on every free, cannot commit-once
|
||||||
|
const char* larson_env = getenv("HAKMEM_TINY_LARSON_FIX");
|
||||||
|
int larson_fix_enabled = (larson_env && *larson_env && *larson_env != '0') ? 1 : 0;
|
||||||
|
|
||||||
|
if (larson_fix_enabled) {
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
fprintf(stderr, "[FREE_LEGACY_MASK] FAIL-FAST: HAKMEM_TINY_LARSON_FIX=1 incompatible, disabling\n");
|
||||||
|
fflush(stderr);
|
||||||
|
#endif
|
||||||
|
g_free_legacy_mask_enabled = 0;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// 3. Ensure route snapshot is initialized
|
||||||
|
tiny_route_snapshot_init();
|
||||||
|
|
||||||
|
// 4. Get nonlegacy mask (classes that use ULTRA/MID/V7)
|
||||||
|
uint8_t nonlegacy_mask = free_policy_fast_v2_nonlegacy_mask();
|
||||||
|
|
||||||
|
// 5. Check if C7 ULTRA is enabled (special case: C7 has ULTRA fast path)
|
||||||
|
int c7_ultra_enabled = tiny_c7_ultra_enabled_env();
|
||||||
|
|
||||||
|
// 6. Compute legacy_mask: bit i = 1 if class i is LEGACY (not in nonlegacy_mask)
|
||||||
|
// and route confirms LEGACY
|
||||||
|
uint8_t mask = 0;
|
||||||
|
for (unsigned i = TINY_C0; i <= TINY_C7; i++) {
|
||||||
|
// Skip if class is in non-legacy mask (ULTRA/MID/V7 active)
|
||||||
|
if (nonlegacy_mask & (1u << i)) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Skip if C7 and ULTRA is enabled (C7 ULTRA has dedicated fast path)
|
||||||
|
if (i == 7 && c7_ultra_enabled) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check route snapshot
|
||||||
|
tiny_route_kind_t route = tiny_route_for_class((uint8_t)i);
|
||||||
|
if (route == TINY_ROUTE_LEGACY) {
|
||||||
|
mask |= (1u << i);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
g_free_legacy_mask = mask;
|
||||||
|
g_free_legacy_mask_enabled = 1;
|
||||||
|
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
fprintf(stderr, "[FREE_LEGACY_MASK] enabled=1 mask=0x%02x nonlegacy=0x%02x c7_ultra=%d larson=0\n",
|
||||||
|
mask, nonlegacy_mask, c7_ultra_enabled);
|
||||||
|
fflush(stderr);
|
||||||
|
#endif
|
||||||
|
}
|
||||||
46
core/box/free_path_legacy_mask_box.h
Normal file
46
core/box/free_path_legacy_mask_box.h
Normal file
@ -0,0 +1,46 @@
|
|||||||
|
// free_path_legacy_mask_box.h - Phase 86: Free Path Legacy Mask (mask-only, no indirect calls)
|
||||||
|
//
|
||||||
|
// Goal: Achieve Phase 10 effect (skip ceremony for LEGACY classes) with lower cost by:
|
||||||
|
// - Computing legacy_mask at init-time (bench_profile boundary)
|
||||||
|
// - Avoiding indirect call overhead (no function pointers)
|
||||||
|
// - Single direct call to tiny_legacy_fallback_free_base_with_env()
|
||||||
|
// - No table lookups in hot path (just bit test)
|
||||||
|
//
|
||||||
|
// Design (Box Theory):
|
||||||
|
// - Single boundary: bench_profile calls free_path_legacy_mask_refresh_from_env()
|
||||||
|
// after applying presets (putenv defaults).
|
||||||
|
// - Cache: legacy_mask (bitset, 1 bit per class C0-C7)
|
||||||
|
// - Hot path: If enabled and (mask & (1 << class_idx)), skip policy/route/mono ceremony
|
||||||
|
// and call tiny_legacy_fallback_free_base_with_env() directly.
|
||||||
|
// - Reversible: toggle HAKMEM_FREE_PATH_LEGACY_MASK=0/1.
|
||||||
|
//
|
||||||
|
// Fail-fast: If HAKMEM_TINY_LARSON_FIX=1, disable (cross-thread owner_tid validation needed).
|
||||||
|
//
|
||||||
|
// ENV:
|
||||||
|
// - HAKMEM_FREE_PATH_LEGACY_MASK=0/1 (default 0)
|
||||||
|
|
||||||
|
#ifndef HAK_BOX_FREE_PATH_LEGACY_MASK_BOX_H
|
||||||
|
#define HAK_BOX_FREE_PATH_LEGACY_MASK_BOX_H
|
||||||
|
|
||||||
|
#include <stdint.h>
|
||||||
|
|
||||||
|
// Refresh (single boundary): bench_profile calls this after putenv defaults.
|
||||||
|
void free_path_legacy_mask_refresh_from_env(void);
|
||||||
|
|
||||||
|
// Cached state (read in hot path).
|
||||||
|
extern uint8_t g_free_legacy_mask_enabled;
|
||||||
|
extern uint8_t g_free_legacy_mask; // Bitset: bit i = 1 if class i is LEGACY and can skip ceremony
|
||||||
|
|
||||||
|
// Fast-path API (inlined, no fallback needed).
|
||||||
|
__attribute__((always_inline))
|
||||||
|
static inline int free_path_legacy_mask_enabled_fast(void) {
|
||||||
|
return (int)g_free_legacy_mask_enabled;
|
||||||
|
}
|
||||||
|
|
||||||
|
__attribute__((always_inline))
|
||||||
|
static inline int free_path_legacy_mask_has_class(unsigned class_idx) {
|
||||||
|
if (__builtin_expect(class_idx >= 8, 0)) return 0;
|
||||||
|
return (g_free_legacy_mask & (1u << class_idx)) ? 1 : 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif // HAK_BOX_FREE_PATH_LEGACY_MASK_BOX_H
|
||||||
@ -74,6 +74,8 @@
|
|||||||
#include "../box/free_cold_shape_stats_box.h" // Phase 5 E5-3a: Free cold shape stats
|
#include "../box/free_cold_shape_stats_box.h" // Phase 5 E5-3a: Free cold shape stats
|
||||||
#include "../box/free_tiny_fast_mono_dualhot_env_box.h" // Phase 9: MONO DUALHOT ENV gate
|
#include "../box/free_tiny_fast_mono_dualhot_env_box.h" // Phase 9: MONO DUALHOT ENV gate
|
||||||
#include "../box/free_tiny_fast_mono_legacy_direct_env_box.h" // Phase 10: MONO LEGACY DIRECT ENV gate
|
#include "../box/free_tiny_fast_mono_legacy_direct_env_box.h" // Phase 10: MONO LEGACY DIRECT ENV gate
|
||||||
|
#include "../box/free_path_commit_once_fixed_box.h" // Phase 85: Free path commit-once (LEGACY-only)
|
||||||
|
#include "../box/free_path_legacy_mask_box.h" // Phase 86: Free path legacy mask (mask-only, no indirect calls)
|
||||||
#include "../box/alloc_passdown_ssot_env_box.h" // Phase 60: Alloc pass-down SSOT
|
#include "../box/alloc_passdown_ssot_env_box.h" // Phase 60: Alloc pass-down SSOT
|
||||||
|
|
||||||
// Helper: current thread id (low 32 bits) for owner check
|
// Helper: current thread id (low 32 bits) for owner check
|
||||||
@ -955,6 +957,39 @@ static inline int free_tiny_fast(void* ptr) {
|
|||||||
// Phase 19-3b: Consolidate ENV snapshot reads (capture once per free_tiny_fast call).
|
// Phase 19-3b: Consolidate ENV snapshot reads (capture once per free_tiny_fast call).
|
||||||
const HakmemEnvSnapshot* env = hakmem_env_snapshot_enabled() ? hakmem_env_snapshot() : NULL;
|
const HakmemEnvSnapshot* env = hakmem_env_snapshot_enabled() ? hakmem_env_snapshot() : NULL;
|
||||||
|
|
||||||
|
// Phase 86: Free path legacy mask - Direct early exit for LEGACY classes (no indirect calls)
|
||||||
|
// Conditions:
|
||||||
|
// - ENV: HAKMEM_FREE_PATH_LEGACY_MASK=1
|
||||||
|
// - class_idx in legacy_mask (LEGACY route, not ULTRA/MID/V7)
|
||||||
|
// - LARSON_FIX=0 (checked at startup, fail-fast if enabled)
|
||||||
|
if (__builtin_expect(free_path_legacy_mask_enabled_fast(), 0)) {
|
||||||
|
if (__builtin_expect(free_path_legacy_mask_has_class((unsigned)class_idx), 0)) {
|
||||||
|
// Direct path: Call legacy handler without policy snapshot, route, or mono checks
|
||||||
|
tiny_legacy_fallback_free_base_with_env(base, (uint32_t)class_idx, env);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Phase 85: Free path commit-once (LEGACY-only) - Skip policy/route/mono ceremony for committed C4-C7
|
||||||
|
// Conditions:
|
||||||
|
// - ENV: HAKMEM_FREE_PATH_COMMIT_ONCE=1
|
||||||
|
// - class_idx in C4-C7 (129-256B LEGACY classes)
|
||||||
|
// - Pre-computed at startup that class can use commit-once
|
||||||
|
// - LARSON_FIX=0 (checked at startup, fail-fast if enabled)
|
||||||
|
if (__builtin_expect(free_path_commit_once_enabled_fast(), 0)) {
|
||||||
|
if (__builtin_expect((unsigned)class_idx >= 4u && (unsigned)class_idx <= 7u, 0)) {
|
||||||
|
const unsigned cache_idx = (unsigned)class_idx - 4u;
|
||||||
|
const struct FreePatchCommitOnceEntry* entry = &g_free_path_commit_once_entries[cache_idx];
|
||||||
|
|
||||||
|
if (__builtin_expect(entry->can_commit, 0)) {
|
||||||
|
// Direct path: Call handler without policy snapshot, route, or mono checks
|
||||||
|
FREE_PATH_STAT_INC(commit_once_hit);
|
||||||
|
entry->handler(base, (uint32_t)class_idx, env);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// Phase 9: MONO DUALHOT early-exit for C0-C3 (skip policy snapshot, direct to legacy)
|
// Phase 9: MONO DUALHOT early-exit for C0-C3 (skip policy snapshot, direct to legacy)
|
||||||
// Conditions:
|
// Conditions:
|
||||||
// - ENV: HAKMEM_FREE_TINY_FAST_MONO_DUALHOT=1
|
// - ENV: HAKMEM_FREE_TINY_FAST_MONO_DUALHOT=1
|
||||||
|
|||||||
@ -46,3 +46,10 @@ allocator比較は layout tax が混ざるため **reference**。
|
|||||||
のままだと、ベンチが“遅い側”に寄ることがある。
|
のままだと、ベンチが“遅い側”に寄ることがある。
|
||||||
|
|
||||||
まずは `HAKMEM_BENCH_ENV_LOG=1` の出力が **同じ**条件同士で比較すること。
|
まずは `HAKMEM_BENCH_ENV_LOG=1` の出力が **同じ**条件同士で比較すること。
|
||||||
|
|
||||||
|
## 6) 外部レビュー(貼り付けパケット)
|
||||||
|
|
||||||
|
「コードを圧縮して貼る」用途は、毎回の手作業を減らすためにパケット生成を使う:
|
||||||
|
|
||||||
|
- 生成スクリプト: `scripts/make_chatgpt_pro_packet_free_path.sh`
|
||||||
|
- 生成物(スナップショット): `docs/analysis/FREE_PATH_REVIEW_PACKET_CHATGPT.md`
|
||||||
|
|||||||
555
docs/analysis/FREE_PATH_REVIEW_PACKET_CHATGPT.md
Normal file
555
docs/analysis/FREE_PATH_REVIEW_PACKET_CHATGPT.md
Normal file
@ -0,0 +1,555 @@
|
|||||||
|
<!--
|
||||||
|
NOTE: This file is a snapshot for copy/paste review.
|
||||||
|
Regenerate with:
|
||||||
|
scripts/make_chatgpt_pro_packet_free_path.sh > docs/analysis/FREE_PATH_REVIEW_PACKET_CHATGPT.md
|
||||||
|
-->
|
||||||
|
|
||||||
|
# Hakmem free-path review packet (compact)
|
||||||
|
|
||||||
|
Goal: understand remaining fixed costs vs mimalloc/tcmalloc, with Box Theory (single boundary, reversible ENV gates).
|
||||||
|
|
||||||
|
SSOT bench conditions (current practice):
|
||||||
|
- `HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE`
|
||||||
|
- `ITERS=20000000 WS=400 RUNS=10`
|
||||||
|
- run via `scripts/run_mixed_10_cleanenv.sh`
|
||||||
|
|
||||||
|
Request:
|
||||||
|
1) Where is the dominant fixed cost on free path now?
|
||||||
|
2) What structural change would give +5–10% without breaking Box Theory?
|
||||||
|
3) What NOT to do (layout tax pitfalls)?
|
||||||
|
|
||||||
|
## Code excerpts (clipped)
|
||||||
|
|
||||||
|
### `core/box/tiny_free_gate_box.h`
|
||||||
|
```c
|
||||||
|
static inline int tiny_free_gate_try_fast(void* user_ptr)
|
||||||
|
{
|
||||||
|
#if !HAKMEM_TINY_HEADER_CLASSIDX
|
||||||
|
(void)user_ptr;
|
||||||
|
// Header 無効構成では Tiny Fast Path 自体を使わない
|
||||||
|
return 0;
|
||||||
|
#else
|
||||||
|
if (__builtin_expect(!user_ptr, 0)) {
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Layer 3a: 軽量 Fail-Fast(常時ON)
|
||||||
|
// 明らかに不正なアドレス(極端に小さい値)は Fast Path では扱わない。
|
||||||
|
// Slow Path 側(hak_free_at + registry/header)に任せる。
|
||||||
|
{
|
||||||
|
uintptr_t addr = (uintptr_t)user_ptr;
|
||||||
|
if (__builtin_expect(addr < 4096, 0)) {
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
static _Atomic uint32_t g_free_gate_range_invalid = 0;
|
||||||
|
uint32_t n = atomic_fetch_add_explicit(&g_free_gate_range_invalid, 1, memory_order_relaxed);
|
||||||
|
if (n < 8) {
|
||||||
|
fprintf(stderr,
|
||||||
|
"[TINY_FREE_GATE_RANGE_INVALID] ptr=%p\n",
|
||||||
|
user_ptr);
|
||||||
|
fflush(stderr);
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// 将来の拡張ポイント:
|
||||||
|
// - DIAG ON のときだけ Bridge + Guard を実行し、
|
||||||
|
// Tiny 管理外と判定された場合は Fast Path をスキップする。
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
if (__builtin_expect(tiny_free_gate_diag_enabled(), 0)) {
|
||||||
|
TinyFreeGateContext ctx;
|
||||||
|
if (!tiny_free_gate_classify(user_ptr, &ctx)) {
|
||||||
|
// Tiny 管理外 or Bridge 失敗 → Fast Path は使わない
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
(void)ctx; // 現時点ではログ専用。将来はここから Guard を挿入。
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
// 本体は既存の ultra-fast free に丸投げ(挙動を変えない)
|
||||||
|
return hak_tiny_free_fast_v2(user_ptr);
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### `core/front/malloc_tiny_fast.h`
|
||||||
|
```c
|
||||||
|
static inline int free_tiny_fast(void* ptr) {
|
||||||
|
if (__builtin_expect(!ptr, 0)) return 0;
|
||||||
|
|
||||||
|
#if HAKMEM_TINY_HEADER_CLASSIDX
|
||||||
|
// 1. ページ境界ガード:
|
||||||
|
// ptr がページ先頭 (offset==0) の場合、ptr-1 は別ページか未マップ領域になる可能性がある。
|
||||||
|
// その場合はヘッダ読みを行わず、通常 free 経路にフォールバックする。
|
||||||
|
uintptr_t off = (uintptr_t)ptr & 0xFFFu;
|
||||||
|
if (__builtin_expect(off == 0, 0)) {
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
// 2. Fast header magic validation (必須)
|
||||||
|
// Release ビルドでは tiny_region_id_read_header() が magic を省略するため、
|
||||||
|
// ここで自前に Tiny 専用ヘッダ (0xA0) を検証しておく。
|
||||||
|
uint8_t* header_ptr = (uint8_t*)ptr - 1;
|
||||||
|
uint8_t header = *header_ptr;
|
||||||
|
uint8_t magic = header & 0xF0u;
|
||||||
|
if (__builtin_expect(magic != HEADER_MAGIC, 0)) {
|
||||||
|
// Tiny ヘッダではない → Mid/Large/外部ポインタなので通常 free 経路へ
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
// 3. class_idx 抽出(下位4bit)
|
||||||
|
int class_idx = (int)(header & HEADER_CLASS_MASK);
|
||||||
|
if (__builtin_expect(class_idx < 0 || class_idx >= TINY_NUM_CLASSES, 0)) {
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
// 4. BASE を計算して Unified Cache に push
|
||||||
|
void* base = tiny_user_to_base_inline(ptr);
|
||||||
|
tiny_front_free_stat_inc(class_idx);
|
||||||
|
|
||||||
|
// Phase FREE-LEGACY-BREAKDOWN-1: カウンタ散布 (1. 関数入口)
|
||||||
|
FREE_PATH_STAT_INC(total_calls);
|
||||||
|
|
||||||
|
// Phase 19-3b: Consolidate ENV snapshot reads (capture once per free_tiny_fast call).
|
||||||
|
const HakmemEnvSnapshot* env = hakmem_env_snapshot_enabled() ? hakmem_env_snapshot() : NULL;
|
||||||
|
|
||||||
|
// Phase 9: MONO DUALHOT early-exit for C0-C3 (skip policy snapshot, direct to legacy)
|
||||||
|
// Conditions:
|
||||||
|
// - ENV: HAKMEM_FREE_TINY_FAST_MONO_DUALHOT=1
|
||||||
|
// - class_idx <= 3 (C0-C3)
|
||||||
|
// - !HAKMEM_TINY_LARSON_FIX (cross-thread handling requires full validation)
|
||||||
|
// - g_tiny_route_snapshot_done == 1 && route == TINY_ROUTE_LEGACY (断定できないときは既存経路)
|
||||||
|
if ((unsigned)class_idx <= 3u) {
|
||||||
|
if (free_tiny_fast_mono_dualhot_enabled()) {
|
||||||
|
static __thread int g_larson_fix = -1;
|
||||||
|
if (__builtin_expect(g_larson_fix == -1, 0)) {
|
||||||
|
const char* e = getenv("HAKMEM_TINY_LARSON_FIX");
|
||||||
|
g_larson_fix = (e && *e && *e != '0') ? 1 : 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!g_larson_fix &&
|
||||||
|
g_tiny_route_snapshot_done == 1 &&
|
||||||
|
g_tiny_route_class[class_idx] == TINY_ROUTE_LEGACY) {
|
||||||
|
// Direct path: Skip policy snapshot, go straight to legacy fallback
|
||||||
|
FREE_PATH_STAT_INC(mono_dualhot_hit);
|
||||||
|
tiny_legacy_fallback_free_base_with_env(base, (uint32_t)class_idx, env);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Phase 10: MONO LEGACY DIRECT early-exit for C4-C7 (skip policy snapshot, direct to legacy)
|
||||||
|
// Conditions:
|
||||||
|
// - ENV: HAKMEM_FREE_TINY_FAST_MONO_LEGACY_DIRECT=1
|
||||||
|
// - cached nonlegacy_mask: class is NOT in non-legacy mask (= ULTRA/MID/V7 not active)
|
||||||
|
// - g_tiny_route_snapshot_done == 1 && route == TINY_ROUTE_LEGACY (断定できないときは既存経路)
|
||||||
|
// - !HAKMEM_TINY_LARSON_FIX (cross-thread handling requires full validation)
|
||||||
|
if (free_tiny_fast_mono_legacy_direct_enabled()) {
|
||||||
|
// 1. Check nonlegacy mask (computed once at init)
|
||||||
|
uint8_t nonlegacy_mask = free_tiny_fast_mono_legacy_direct_nonlegacy_mask();
|
||||||
|
if ((nonlegacy_mask & (1u << class_idx)) == 0) {
|
||||||
|
// 2. Check route snapshot
|
||||||
|
if (g_tiny_route_snapshot_done == 1 && g_tiny_route_class[class_idx] == TINY_ROUTE_LEGACY) {
|
||||||
|
// 3. Check Larson fix
|
||||||
|
static __thread int g_larson_fix = -1;
|
||||||
|
if (__builtin_expect(g_larson_fix == -1, 0)) {
|
||||||
|
const char* e = getenv("HAKMEM_TINY_LARSON_FIX");
|
||||||
|
g_larson_fix = (e && *e && *e != '0') ? 1 : 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!g_larson_fix) {
|
||||||
|
// Direct path: Skip policy snapshot, go straight to legacy fallback
|
||||||
|
FREE_PATH_STAT_INC(mono_legacy_direct_hit);
|
||||||
|
tiny_legacy_fallback_free_base_with_env(base, (uint32_t)class_idx, env);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Phase v11b-1: C7 ULTRA early-exit (skip policy snapshot for most common case)
|
||||||
|
// Phase 4 E1: Use ENV snapshot when enabled (consolidates 3 TLS reads → 1)
|
||||||
|
// Phase 19-3a: Remove UNLIKELY hint (snapshot is ON by default in presets, hint is backwards)
|
||||||
|
const bool c7_ultra_free = env ? env->tiny_c7_ultra_enabled : tiny_c7_ultra_enabled_env();
|
||||||
|
|
||||||
|
if (class_idx == 7 && c7_ultra_free) {
|
||||||
|
tiny_c7_ultra_free(ptr);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Phase POLICY-FAST-PATH-V2: Skip policy snapshot for known-legacy classes
|
||||||
|
if (free_policy_fast_v2_can_skip((uint8_t)class_idx)) {
|
||||||
|
FREE_PATH_STAT_INC(policy_fast_v2_skip);
|
||||||
|
goto legacy_fallback;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Phase v11b-1: Policy-based single switch (replaces serial ULTRA checks)
|
||||||
|
const SmallPolicyV7* policy_free = small_policy_v7_snapshot();
|
||||||
|
SmallRouteKind route_kind_free = policy_free->route_kind[class_idx];
|
||||||
|
|
||||||
|
switch (route_kind_free) {
|
||||||
|
case SMALL_ROUTE_ULTRA: {
|
||||||
|
// Phase TLS-UNIFY-1: Unified ULTRA TLS push for C4-C6 (C7 handled above)
|
||||||
|
if (class_idx >= 4 && class_idx <= 6) {
|
||||||
|
tiny_ultra_tls_push((uint8_t)class_idx, base);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
// ULTRA for other classes → fallback to LEGACY
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
case SMALL_ROUTE_MID_V35: {
|
||||||
|
// Phase v11a-3: MID v3.5 free
|
||||||
|
small_mid_v35_free(ptr, class_idx);
|
||||||
|
FREE_PATH_STAT_INC(smallheap_v7_fast);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
case SMALL_ROUTE_V7: {
|
||||||
|
// Phase v7: SmallObject v7 free (research box)
|
||||||
|
if (small_heap_free_fast_v7_stub(ptr, (uint8_t)class_idx)) {
|
||||||
|
FREE_PATH_STAT_INC(smallheap_v7_fast);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
// V7 miss → fallback to LEGACY
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
case SMALL_ROUTE_MID_V3: {
|
||||||
|
// Phase MID-V3: delegate to MID v3.5
|
||||||
|
small_mid_v35_free(ptr, class_idx);
|
||||||
|
FREE_PATH_STAT_INC(smallheap_v7_fast);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
case SMALL_ROUTE_LEGACY:
|
||||||
|
default:
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
legacy_fallback:
|
||||||
|
// LEGACY fallback path
|
||||||
|
// Phase 19-6C: Compute route once using helper (avoid redundant tiny_route_for_class)
|
||||||
|
tiny_route_kind_t route;
|
||||||
|
int use_tiny_heap;
|
||||||
|
free_tiny_fast_compute_route_and_heap(class_idx, &route, &use_tiny_heap);
|
||||||
|
|
||||||
|
// TWO-SPEED: SuperSlab registration check is DEBUG-ONLY to keep HOT PATH fast.
|
||||||
|
// In Release builds, we trust header magic (0xA0) as sufficient validation.
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
// 5. Superslab 登録確認(誤分類防止)
|
||||||
|
SuperSlab* ss_guard = hak_super_lookup(ptr);
|
||||||
|
if (__builtin_expect(!(ss_guard && ss_guard->magic == SUPERSLAB_MAGIC), 0)) {
|
||||||
|
return 0; // hakmem 管理外 → 通常 free 経路へ
|
||||||
|
}
|
||||||
|
#endif // !HAKMEM_BUILD_RELEASE
|
||||||
|
|
||||||
|
// Cross-thread free detection (Larson MT crash fix, ENV gated) + TinyHeap free path
|
||||||
|
{
|
||||||
|
static __thread int g_larson_fix = -1;
|
||||||
|
if (__builtin_expect(g_larson_fix == -1, 0)) {
|
||||||
|
const char* e = getenv("HAKMEM_TINY_LARSON_FIX");
|
||||||
|
g_larson_fix = (e && *e && *e != '0') ? 1 : 0;
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
fprintf(stderr, "[LARSON_FIX_INIT] g_larson_fix=%d (env=%s)\n", g_larson_fix, e ? e : "NULL");
|
||||||
|
fflush(stderr);
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
if (__builtin_expect(g_larson_fix || use_tiny_heap, 0)) {
|
||||||
|
// Phase 12 optimization: Use fast mask-based lookup (~5-10 cycles vs 50-100)
|
||||||
|
SuperSlab* ss = ss_fast_lookup(base);
|
||||||
|
// Phase FREE-LEGACY-BREAKDOWN-1: カウンタ散布 (5. super_lookup 呼び出し)
|
||||||
|
FREE_PATH_STAT_INC(super_lookup_called);
|
||||||
|
if (ss) {
|
||||||
|
int slab_idx = slab_index_for(ss, base);
|
||||||
|
if (__builtin_expect(slab_idx >= 0 && slab_idx < ss_slabs_capacity(ss), 1)) {
|
||||||
|
uint32_t self_tid = tiny_self_u32_local();
|
||||||
|
uint8_t owner_tid_low = ss_slab_meta_owner_tid_low_get(ss, slab_idx);
|
||||||
|
TinySlabMeta* meta = &ss->slabs[slab_idx];
|
||||||
|
// LARSON FIX: Use bits 8-15 for comparison (pthread TIDs aligned to 256 bytes)
|
||||||
|
uint8_t self_tid_cmp = (uint8_t)((self_tid >> 8) & 0xFFu);
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
static _Atomic uint64_t g_owner_check_count = 0;
|
||||||
|
uint64_t oc = atomic_fetch_add(&g_owner_check_count, 1);
|
||||||
|
if (oc < 10) {
|
||||||
|
fprintf(stderr, "[LARSON_FIX] Owner check: ptr=%p owner_tid_low=0x%02x self_tid_cmp=0x%02x self_tid=0x%08x match=%d\n",
|
||||||
|
ptr, owner_tid_low, self_tid_cmp, self_tid, (owner_tid_low == self_tid_cmp));
|
||||||
|
fflush(stderr);
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
if (__builtin_expect(owner_tid_low != self_tid_cmp, 0)) {
|
||||||
|
// Cross-thread free → route to remote queue instead of poisoning TLS cache
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
static _Atomic uint64_t g_cross_thread_count = 0;
|
||||||
|
uint64_t ct = atomic_fetch_add(&g_cross_thread_count, 1);
|
||||||
|
if (ct < 20) {
|
||||||
|
fprintf(stderr, "[LARSON_FIX] Cross-thread free detected! ptr=%p owner_tid_low=0x%02x self_tid_cmp=0x%02x self_tid=0x%08x\n",
|
||||||
|
ptr, owner_tid_low, self_tid_cmp, self_tid);
|
||||||
|
fflush(stderr);
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
if (tiny_free_remote_box(ss, slab_idx, meta, ptr, self_tid)) {
|
||||||
|
// Phase FREE-LEGACY-BREAKDOWN-1: カウンタ散布 (6. cross-thread free)
|
||||||
|
FREE_PATH_STAT_INC(remote_free);
|
||||||
|
return 1; // handled via remote queue
|
||||||
|
```
|
||||||
|
|
||||||
|
### `core/box/tiny_front_hot_box.h`
|
||||||
|
```c
|
||||||
|
static inline int tiny_hot_free_fast(int class_idx, void* base) {
|
||||||
|
extern __thread TinyUnifiedCache g_unified_cache[];
|
||||||
|
|
||||||
|
// TLS cache access (1 cache miss)
|
||||||
|
// NOTE: Range check removed - caller guarantees valid class_idx
|
||||||
|
TinyUnifiedCache* cache = &g_unified_cache[class_idx];
|
||||||
|
|
||||||
|
#if HAKMEM_TINY_UNIFIED_LIFO_COMPILED
|
||||||
|
// Phase 15 v1: Mode check at entry (once per call, not scattered in hot path)
|
||||||
|
// Phase 22: Compile-out when disabled (default OFF)
|
||||||
|
int lifo_mode = tiny_unified_lifo_enabled();
|
||||||
|
|
||||||
|
// Phase 15 v1: LIFO vs FIFO mode switch
|
||||||
|
if (lifo_mode) {
|
||||||
|
// === LIFO MODE: Stack-based (LIFO) ===
|
||||||
|
// Try push to stack (tail is stack depth)
|
||||||
|
if (unified_cache_try_push_lifo(class_idx, base)) {
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
extern __thread uint64_t g_unified_cache_push[];
|
||||||
|
g_unified_cache_push[class_idx]++;
|
||||||
|
#endif
|
||||||
|
return 1; // SUCCESS
|
||||||
|
}
|
||||||
|
// LIFO overflow → fall through to cold path
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
extern __thread uint64_t g_unified_cache_full[];
|
||||||
|
g_unified_cache_full[class_idx]++;
|
||||||
|
#endif
|
||||||
|
return 0; // FULL
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
// === FIFO MODE: Ring-based (existing, default) ===
|
||||||
|
// Calculate next tail (for full check)
|
||||||
|
uint16_t next_tail = (cache->tail + 1) & cache->mask;
|
||||||
|
|
||||||
|
// Branch 1: Cache full check (UNLIKELY full)
|
||||||
|
// Hot path: cache has space (next_tail != head)
|
||||||
|
// Cold path: cache full (next_tail == head) → drain needed
|
||||||
|
if (TINY_HOT_LIKELY(next_tail != cache->head)) {
|
||||||
|
// === HOT PATH: Cache has space (2-3 instructions) ===
|
||||||
|
|
||||||
|
// Push to cache (1 cache miss for array write)
|
||||||
|
cache->slots[cache->tail] = base;
|
||||||
|
cache->tail = next_tail;
|
||||||
|
|
||||||
|
// Debug metrics (zero overhead in release)
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
extern __thread uint64_t g_unified_cache_push[];
|
||||||
|
g_unified_cache_push[class_idx]++;
|
||||||
|
#endif
|
||||||
|
|
||||||
|
return 1; // SUCCESS
|
||||||
|
}
|
||||||
|
|
||||||
|
// === COLD PATH: Cache full ===
|
||||||
|
// Don't drain here - let caller handle via tiny_cold_drain_and_free()
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
extern __thread uint64_t g_unified_cache_full[];
|
||||||
|
g_unified_cache_full[class_idx]++;
|
||||||
|
#endif
|
||||||
|
|
||||||
|
return 0; // FULL
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### `core/box/tiny_legacy_fallback_box.h`
|
||||||
|
```c
|
||||||
|
static inline void tiny_legacy_fallback_free_base_with_env(void* base, uint32_t class_idx, const HakmemEnvSnapshot* env) {
|
||||||
|
// Phase 80-1: Switch dispatch for C4/C5/C6 (branch reduction optimization)
|
||||||
|
// Phase 83-1: Per-op branch removed via fixed-mode caching
|
||||||
|
// C2/C3 excluded (NO-GO from Phase 77-1/79-1)
|
||||||
|
if (tiny_inline_slots_switch_dispatch_enabled_fast()) {
|
||||||
|
// Switch mode: Direct jump to case (zero comparison overhead for C4/C5/C6)
|
||||||
|
switch (class_idx) {
|
||||||
|
case 4:
|
||||||
|
if (tiny_c4_inline_slots_enabled_fast()) {
|
||||||
|
if (c4_inline_push(c4_inline_tls(), base)) {
|
||||||
|
FREE_PATH_STAT_INC(legacy_fallback);
|
||||||
|
if (__builtin_expect(free_path_stats_enabled(), 0)) {
|
||||||
|
g_free_path_stats.legacy_by_class[class_idx]++;
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
case 5:
|
||||||
|
if (tiny_c5_inline_slots_enabled_fast()) {
|
||||||
|
if (c5_inline_push(c5_inline_tls(), base)) {
|
||||||
|
FREE_PATH_STAT_INC(legacy_fallback);
|
||||||
|
if (__builtin_expect(free_path_stats_enabled(), 0)) {
|
||||||
|
g_free_path_stats.legacy_by_class[class_idx]++;
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
case 6:
|
||||||
|
if (tiny_c6_inline_slots_enabled_fast()) {
|
||||||
|
if (c6_inline_push(c6_inline_tls(), base)) {
|
||||||
|
FREE_PATH_STAT_INC(legacy_fallback);
|
||||||
|
if (__builtin_expect(free_path_stats_enabled(), 0)) {
|
||||||
|
g_free_path_stats.legacy_by_class[class_idx]++;
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
// C0-C3, C7: fall through to unified_cache push
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
// Switch mode: fall through to unified_cache push after miss
|
||||||
|
} else {
|
||||||
|
// If-chain mode (Phase 80-1 baseline): C3/C4/C5/C6 sequential checks
|
||||||
|
// NOTE: C2 local cache (Phase 79-1 NO-GO) removed from hot path
|
||||||
|
|
||||||
|
// Phase 77-1: C3 Inline Slots early-exit (ENV gated)
|
||||||
|
// Try C3 inline slots SECOND (before C4/C5/C6/unified cache) for class 3
|
||||||
|
if (class_idx == 3 && tiny_c3_inline_slots_enabled_fast()) {
|
||||||
|
if (c3_inline_push(c3_inline_tls(), base)) {
|
||||||
|
// Success: pushed to C3 inline slots
|
||||||
|
FREE_PATH_STAT_INC(legacy_fallback);
|
||||||
|
if (__builtin_expect(free_path_stats_enabled(), 0)) {
|
||||||
|
g_free_path_stats.legacy_by_class[class_idx]++;
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
// FULL → fall through to C4/C5/C6/unified cache
|
||||||
|
}
|
||||||
|
|
||||||
|
// Phase 76-1: C4 Inline Slots early-exit (ENV gated)
|
||||||
|
// Try C4 inline slots SECOND (before C5/C6/unified cache) for class 4
|
||||||
|
if (class_idx == 4 && tiny_c4_inline_slots_enabled_fast()) {
|
||||||
|
if (c4_inline_push(c4_inline_tls(), base)) {
|
||||||
|
// Success: pushed to C4 inline slots
|
||||||
|
FREE_PATH_STAT_INC(legacy_fallback);
|
||||||
|
if (__builtin_expect(free_path_stats_enabled(), 0)) {
|
||||||
|
g_free_path_stats.legacy_by_class[class_idx]++;
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
// FULL → fall through to C5/C6/unified cache
|
||||||
|
}
|
||||||
|
|
||||||
|
// Phase 75-2: C5 Inline Slots early-exit (ENV gated)
|
||||||
|
// Try C5 inline slots SECOND (before C6 and unified cache) for class 5
|
||||||
|
if (class_idx == 5 && tiny_c5_inline_slots_enabled_fast()) {
|
||||||
|
if (c5_inline_push(c5_inline_tls(), base)) {
|
||||||
|
// Success: pushed to C5 inline slots
|
||||||
|
FREE_PATH_STAT_INC(legacy_fallback);
|
||||||
|
if (__builtin_expect(free_path_stats_enabled(), 0)) {
|
||||||
|
g_free_path_stats.legacy_by_class[class_idx]++;
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
// FULL → fall through to C6/unified cache
|
||||||
|
}
|
||||||
|
|
||||||
|
// Phase 75-1: C6 Inline Slots early-exit (ENV gated)
|
||||||
|
// Try C6 inline slots THIRD (before unified cache) for class 6
|
||||||
|
if (class_idx == 6 && tiny_c6_inline_slots_enabled_fast()) {
|
||||||
|
if (c6_inline_push(c6_inline_tls(), base)) {
|
||||||
|
// Success: pushed to C6 inline slots
|
||||||
|
FREE_PATH_STAT_INC(legacy_fallback);
|
||||||
|
if (__builtin_expect(free_path_stats_enabled(), 0)) {
|
||||||
|
g_free_path_stats.legacy_by_class[class_idx]++;
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
// FULL → fall through to unified cache
|
||||||
|
}
|
||||||
|
} // End of if-chain mode
|
||||||
|
|
||||||
|
const TinyFrontV3Snapshot* front_snap =
|
||||||
|
env ? (env->tiny_front_v3_enabled ? tiny_front_v3_snapshot_get() : NULL)
|
||||||
|
: (__builtin_expect(tiny_front_v3_enabled(), 0) ? tiny_front_v3_snapshot_get() : NULL);
|
||||||
|
const bool metadata_cache_on = env ? env->tiny_metadata_cache_eff : tiny_metadata_cache_enabled();
|
||||||
|
|
||||||
|
// Phase 3 C2 Patch 2: First page cache hint (optional fast-path)
|
||||||
|
// Check if pointer is in cached page (avoids metadata lookup in future optimizations)
|
||||||
|
if (__builtin_expect(metadata_cache_on, 0)) {
|
||||||
|
// Note: This is a hint-only check. Even if it hits, we still use the standard path.
|
||||||
|
// The cache will be populated during refill operations for future use.
|
||||||
|
// Currently this just validates the cache state; actual optimization TBD.
|
||||||
|
if (tiny_first_page_cache_hit(class_idx, base, 4096)) {
|
||||||
|
// Future: could optimize metadata access here
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Legacy fallback - Unified Cache push
|
||||||
|
if (!front_snap || front_snap->unified_cache_on) {
|
||||||
|
// Phase 74-3 (P0): FASTAPI path (ENV-gated)
|
||||||
|
if (tiny_uc_fastapi_enabled()) {
|
||||||
|
// Preconditions guaranteed:
|
||||||
|
// - unified_cache_on == true (checked above)
|
||||||
|
// - TLS init guaranteed by front_gate_unified_enabled() in malloc_tiny_fast.h
|
||||||
|
// - Stats compiled-out in FAST builds
|
||||||
|
if (unified_cache_push_fast(class_idx, HAK_BASE_FROM_RAW(base))) {
|
||||||
|
FREE_PATH_STAT_INC(legacy_fallback);
|
||||||
|
|
||||||
|
// Per-class breakdown (Phase 4-1)
|
||||||
|
if (__builtin_expect(free_path_stats_enabled(), 0)) {
|
||||||
|
if (class_idx < 8) {
|
||||||
|
g_free_path_stats.legacy_by_class[class_idx]++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
// FULL → fallback to slow path (rare)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Original path (FASTAPI=0 or fallback)
|
||||||
|
if (unified_cache_push(class_idx, HAK_BASE_FROM_RAW(base))) {
|
||||||
|
FREE_PATH_STAT_INC(legacy_fallback);
|
||||||
|
|
||||||
|
// Per-class breakdown (Phase 4-1)
|
||||||
|
if (__builtin_expect(free_path_stats_enabled(), 0)) {
|
||||||
|
if (class_idx < 8) {
|
||||||
|
g_free_path_stats.legacy_by_class[class_idx]++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Final fallback
|
||||||
|
tiny_hot_free_fast(class_idx, base);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Questions to answer (please be concrete)
|
||||||
|
|
||||||
|
1) In these snippets, which checks/branches are still "per-op fixed taxes" on the hot free path?
|
||||||
|
- Please point to specific lines/conditions and estimate cost (branches/instructions or dependency chain).
|
||||||
|
|
||||||
|
2) Is `tiny_hot_free_fast()` already close to optimal, and the real bottleneck is upstream (user->base/classify/route)?
|
||||||
|
- If yes, what’s the smallest structural refactor that removes that upstream fixed tax?
|
||||||
|
|
||||||
|
3) Should we introduce a "commit once" plan (freeze the chosen free path) — or is branch prediction already making lazy-init checks ~free here?
|
||||||
|
- If "commit once", where should it live to avoid runtime gate overhead (bench_profile refresh boundary vs per-op)?
|
||||||
|
|
||||||
|
4) We have had many layout-tax regressions from code removal/reordering.
|
||||||
|
- What patterns here are most likely to trigger layout tax if changed?
|
||||||
|
- How would you stage a safe A/B (same binary, ENV toggle) for your proposal?
|
||||||
|
|
||||||
|
5) If you could change just ONE of:
|
||||||
|
- pointer classification to base/class_idx,
|
||||||
|
- route determination,
|
||||||
|
- unified cache push/pop structure,
|
||||||
|
which is highest ROI for +5–10% on WS=400?
|
||||||
|
|
||||||
|
|
||||||
|
[packet] done
|
||||||
394
docs/analysis/PHASE85_FREE_PATH_COMMIT_ONCE_PLAN.md
Normal file
394
docs/analysis/PHASE85_FREE_PATH_COMMIT_ONCE_PLAN.md
Normal file
@ -0,0 +1,394 @@
|
|||||||
|
# Phase 85: Free Path Commit-Once (LEGACY-only) Implementation Plan
|
||||||
|
|
||||||
|
## 1. Objective & Scope
|
||||||
|
|
||||||
|
**Goal**: Eliminate per-operation policy/route/mono ceremony overhead in `free_tiny_fast()` for LEGACY route by applying Phase 78-1 "commit-once" pattern.
|
||||||
|
|
||||||
|
**Target**: +2.0% improvement (GO threshold)
|
||||||
|
|
||||||
|
**Scope**:
|
||||||
|
- LEGACY route only (classes C4-C7, size 129-256 bytes)
|
||||||
|
- Does NOT apply to ULTRA/MID/V7 routes
|
||||||
|
- Must coexist with existing Phase 9 (MONO DUALHOT) and Phase 10 (MONO LEGACY DIRECT) optimizations
|
||||||
|
- Fail-fast if HAKMEM_TINY_LARSON_FIX enabled (owner_tid validation incompatible with commit-once)
|
||||||
|
|
||||||
|
**Strategy**: Cache Route + Handler mapping at init-time (bench_profile refresh boundary), skip 12-20 branches per free() in hot path.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Architecture & Design
|
||||||
|
|
||||||
|
### 2.1 Core Pattern (Phase 78-1 Adaptation)
|
||||||
|
|
||||||
|
Following Phase 78-1 successful pattern:
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────┐
|
||||||
|
│ Init-time (bench_profile refresh boundary) │
|
||||||
|
│ ───────────────────────────────────────────────── │
|
||||||
|
│ free_path_commit_once_refresh_from_env() │
|
||||||
|
│ ├─ Read ENV: HAKMEM_FREE_PATH_COMMIT_ONCE=0/1 │
|
||||||
|
│ ├─ Fail-fast: if LARSON_FIX enabled → disable │
|
||||||
|
│ ├─ For C4-C7 (LEGACY classes): │
|
||||||
|
│ │ └─ Compute: route_kind, handler function │
|
||||||
|
│ │ └─ Store: g_free_path_commit_once_fixed[4] │
|
||||||
|
│ └─ Set: g_free_path_commit_once_enabled = true │
|
||||||
|
└─────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────┐
|
||||||
|
│ Hot path (every free) │
|
||||||
|
│ ───────────────────────────────────────────────── │
|
||||||
|
│ free_tiny_fast() │
|
||||||
|
│ if (g_free_path_commit_once_enabled_fast()) { │
|
||||||
|
│ // NEW: Direct dispatch, skip all ceremony │
|
||||||
|
│ auto& cached = g_free_path_commit_once_fixed[ │
|
||||||
|
│ class_idx - TINY_C4]; │
|
||||||
|
│ return cached.handler(ptr, class_idx, heap); │
|
||||||
|
│ } │
|
||||||
|
│ // Fallback: existing Phase 9/10/policy/route │
|
||||||
|
│ ... │
|
||||||
|
└─────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2.2 Cached State Structure
|
||||||
|
|
||||||
|
```c
|
||||||
|
typedef void (*FreeTinyHandler)(void* ptr, unsigned class_idx, TinyHeap* heap);
|
||||||
|
|
||||||
|
struct FreePatchCommitOnceEntry {
|
||||||
|
TinyRouteKind route_kind; // LEGACY, ULTRA, MID, V7 (validation only)
|
||||||
|
FreeTinyHandler handler; // Direct function pointer
|
||||||
|
uint8_t valid; // Safety flag
|
||||||
|
};
|
||||||
|
|
||||||
|
// Global state (4 entries for C4-C7)
|
||||||
|
extern FreePatchCommitOnceEntry g_free_path_commit_once_fixed[4];
|
||||||
|
extern bool g_free_path_commit_once_enabled;
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2.3 What Gets Cached
|
||||||
|
|
||||||
|
For each LEGACY class (C4-C7):
|
||||||
|
- **route_kind**: Expected to be `TINY_ROUTE_LEGACY`
|
||||||
|
- **handler**: Function pointer to `tiny_legacy_fallback_free_base_with_env` or appropriate handler
|
||||||
|
- **valid**: Safety flag (1 if cache entry is valid)
|
||||||
|
|
||||||
|
### 2.4 Eliminated Overhead
|
||||||
|
|
||||||
|
**Before** (15-26 branches per free):
|
||||||
|
1. Phase 9 MONO DUALHOT check (3-5 branches)
|
||||||
|
2. Phase 10 MONO LEGACY DIRECT check (4-6 branches)
|
||||||
|
3. Policy snapshot call `small_policy_v7_snapshot()` (5-10 branches, potential getenv)
|
||||||
|
4. Route computation `tiny_route_for_class()` (3-5 branches)
|
||||||
|
5. Switch on route_kind (1-2 branches)
|
||||||
|
|
||||||
|
**After** (commit-once enabled, LEGACY classes):
|
||||||
|
1. Master gate check `g_free_path_commit_once_enabled_fast()` (1 branch, predicted taken)
|
||||||
|
2. Class index range check (1 branch, predicted taken)
|
||||||
|
3. Cached entry lookup (0 branches, direct memory load)
|
||||||
|
4. Direct handler dispatch (1 indirect call)
|
||||||
|
|
||||||
|
**Branch reduction**: 12-20 branches per LEGACY free → **Estimated +2-3% improvement**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Files to Create/Modify
|
||||||
|
|
||||||
|
### 3.1 New Files (Box Pattern)
|
||||||
|
|
||||||
|
#### `core/box/free_path_commit_once_fixed_box.h`
|
||||||
|
```c
|
||||||
|
#ifndef HAKMEM_FREE_PATH_COMMIT_ONCE_FIXED_BOX_H
|
||||||
|
#define HAKMEM_FREE_PATH_COMMIT_ONCE_FIXED_BOX_H
|
||||||
|
|
||||||
|
#include <stdbool.h>
|
||||||
|
#include <stdint.h>
|
||||||
|
#include "core/hakmem_tiny_defs.h"
|
||||||
|
|
||||||
|
typedef void (*FreeTinyHandler)(void* ptr, unsigned class_idx, TinyHeap* heap);
|
||||||
|
|
||||||
|
struct FreePatchCommitOnceEntry {
|
||||||
|
TinyRouteKind route_kind;
|
||||||
|
FreeTinyHandler handler;
|
||||||
|
uint8_t valid;
|
||||||
|
};
|
||||||
|
|
||||||
|
// Global cache (4 entries for C4-C7)
|
||||||
|
extern struct FreePatchCommitOnceEntry g_free_path_commit_once_fixed[4];
|
||||||
|
extern bool g_free_path_commit_once_enabled;
|
||||||
|
|
||||||
|
// Fast-path API (inlined, no fallback needed)
|
||||||
|
static inline bool free_path_commit_once_enabled_fast(void) {
|
||||||
|
return __builtin_expect(g_free_path_commit_once_enabled, 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Refresh (called once at bench_profile boundary)
|
||||||
|
void free_path_commit_once_refresh_from_env(void);
|
||||||
|
|
||||||
|
#endif
|
||||||
|
```
|
||||||
|
|
||||||
|
#### `core/box/free_path_commit_once_fixed_box.c`
|
||||||
|
```c
|
||||||
|
#include "free_path_commit_once_fixed_box.h"
|
||||||
|
#include "core/box/tiny_env_box.h"
|
||||||
|
#include "core/box/tiny_larson_fix_env_box.h"
|
||||||
|
#include "core/hakmem_tiny.h"
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <string.h>
|
||||||
|
|
||||||
|
struct FreePatchCommitOnceEntry g_free_path_commit_once_fixed[4];
|
||||||
|
bool g_free_path_commit_once_enabled = false;
|
||||||
|
|
||||||
|
void free_path_commit_once_refresh_from_env(void) {
|
||||||
|
// Read master ENV gate
|
||||||
|
const char* env_val = getenv("HAKMEM_FREE_PATH_COMMIT_ONCE");
|
||||||
|
bool requested = (env_val && atoi(env_val) == 1);
|
||||||
|
|
||||||
|
if (!requested) {
|
||||||
|
g_free_path_commit_once_enabled = false;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Fail-fast: LARSON_FIX incompatible with commit-once
|
||||||
|
if (tiny_larson_fix_enabled()) {
|
||||||
|
fprintf(stderr, "[FREE_COMMIT_ONCE] FAIL-FAST: HAKMEM_TINY_LARSON_FIX=1 incompatible, disabling\n");
|
||||||
|
g_free_path_commit_once_enabled = false;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Pre-compute route + handler for C4-C7 (LEGACY)
|
||||||
|
for (unsigned i = 0; i < 4; i++) {
|
||||||
|
unsigned class_idx = TINY_C4 + i;
|
||||||
|
|
||||||
|
// Route determination (expect LEGACY for C4-C7)
|
||||||
|
TinyRouteKind route = tiny_route_for_class(class_idx);
|
||||||
|
|
||||||
|
// Handler selection (simplified, matches free_tiny_fast logic)
|
||||||
|
FreeTinyHandler handler = NULL;
|
||||||
|
|
||||||
|
if (route == TINY_ROUTE_LEGACY) {
|
||||||
|
handler = tiny_legacy_fallback_free_base_with_env;
|
||||||
|
} else {
|
||||||
|
// Unexpected route, fail-fast
|
||||||
|
fprintf(stderr, "[FREE_COMMIT_ONCE] FAIL-FAST: C%u route=%d not LEGACY, disabling\n",
|
||||||
|
class_idx, (int)route);
|
||||||
|
g_free_path_commit_once_enabled = false;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
g_free_path_commit_once_fixed[i].route_kind = route;
|
||||||
|
g_free_path_commit_once_fixed[i].handler = handler;
|
||||||
|
g_free_path_commit_once_fixed[i].valid = 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
g_free_path_commit_once_enabled = true;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3.2 Modified Files
|
||||||
|
|
||||||
|
#### `core/front/malloc_tiny_fast.h` (free_tiny_fast function)
|
||||||
|
|
||||||
|
**Insertion point**: Line ~950, before Phase 9/10 checks
|
||||||
|
|
||||||
|
```c
|
||||||
|
static void free_tiny_fast(void* ptr, unsigned class_idx, TinyHeap* heap, ...) {
|
||||||
|
// NEW: Phase 85 commit-once fast path (LEGACY classes only)
|
||||||
|
#if HAKMEM_BOX_FREE_PATH_COMMIT_ONCE_FIXED
|
||||||
|
if (free_path_commit_once_enabled_fast()) {
|
||||||
|
if (class_idx >= TINY_C4 && class_idx <= TINY_C7) {
|
||||||
|
const unsigned cache_idx = class_idx - TINY_C4;
|
||||||
|
const struct FreePatchCommitOnceEntry* entry =
|
||||||
|
&g_free_path_commit_once_fixed[cache_idx];
|
||||||
|
|
||||||
|
if (__builtin_expect(entry->valid, 1)) {
|
||||||
|
entry->handler(ptr, class_idx, heap);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
// Existing Phase 9/10/policy/route ceremony (fallback)
|
||||||
|
...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### `core/bench_profile.h` (refresh function integration)
|
||||||
|
|
||||||
|
Add to `refresh_all_env_caches()`:
|
||||||
|
|
||||||
|
```c
|
||||||
|
void refresh_all_env_caches(void) {
|
||||||
|
// ... existing refreshes ...
|
||||||
|
|
||||||
|
#if HAKMEM_BOX_FREE_PATH_COMMIT_ONCE_FIXED
|
||||||
|
free_path_commit_once_refresh_from_env();
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### `Makefile` (box flag)
|
||||||
|
|
||||||
|
Add new box flag:
|
||||||
|
|
||||||
|
```makefile
|
||||||
|
BOX_FREE_PATH_COMMIT_ONCE_FIXED ?= 1
|
||||||
|
CFLAGS += -DHAKMEM_BOX_FREE_PATH_COMMIT_ONCE_FIXED=$(BOX_FREE_PATH_COMMIT_ONCE_FIXED)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Implementation Stages
|
||||||
|
|
||||||
|
### Stage 1: Box Infrastructure (1-2 hours)
|
||||||
|
1. Create `free_path_commit_once_fixed_box.h` with struct definition, global declarations, fast-path API
|
||||||
|
2. Create `free_path_commit_once_fixed_box.c` with refresh implementation
|
||||||
|
3. Add Makefile box flag
|
||||||
|
4. Integrate refresh call into `core/bench_profile.h`
|
||||||
|
5. **Validation**: Compile, verify no build errors
|
||||||
|
|
||||||
|
### Stage 2: Hot Path Integration (1 hour)
|
||||||
|
1. Modify `core/front/malloc_tiny_fast.h` to add Phase 85 fast path at line ~950
|
||||||
|
2. Add class range check (C4-C7) and cache lookup
|
||||||
|
3. Add handler dispatch with validity check
|
||||||
|
4. **Validation**: Compile, verify no build errors, run basic functionality test
|
||||||
|
|
||||||
|
### Stage 3: Fail-Fast Safety (30 min)
|
||||||
|
1. Test LARSON_FIX=1 scenario, verify commit-once disabled
|
||||||
|
2. Test invalid route scenario (C4-C7 with non-LEGACY route)
|
||||||
|
3. **Validation**: Both scenarios should log fail-fast message and fall back to standard path
|
||||||
|
|
||||||
|
### Stage 4: A/B Testing (2-3 hours)
|
||||||
|
1. Build single binary with box flag enabled
|
||||||
|
2. Baseline test: `HAKMEM_FREE_PATH_COMMIT_ONCE=0 RUNS=10 scripts/run_mixed_10_cleanenv.sh`
|
||||||
|
3. Treatment test: `HAKMEM_FREE_PATH_COMMIT_ONCE=1 RUNS=10 scripts/run_mixed_10_cleanenv.sh`
|
||||||
|
4. Compare mean/median/CV, calculate delta
|
||||||
|
5. **GO criteria**: +2.0% or better
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Test Plan
|
||||||
|
|
||||||
|
### 5.1 SSOT Baseline (10-run)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Control (commit-once disabled)
|
||||||
|
HAKMEM_FREE_PATH_COMMIT_ONCE=0 RUNS=10 scripts/run_mixed_10_cleanenv.sh > /tmp/phase85_control.txt
|
||||||
|
|
||||||
|
# Treatment (commit-once enabled)
|
||||||
|
HAKMEM_FREE_PATH_COMMIT_ONCE=1 RUNS=10 scripts/run_mixed_10_cleanenv.sh > /tmp/phase85_treatment.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
**Expected baseline**: 55.53M ops/s (from recent allocator matrix)
|
||||||
|
|
||||||
|
**GO threshold**: 55.53M × 1.02 = **56.64M ops/s** (treatment mean)
|
||||||
|
|
||||||
|
### 5.2 Safety Tests
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Test 1: LARSON_FIX incompatibility
|
||||||
|
HAKMEM_TINY_LARSON_FIX=1 HAKMEM_FREE_PATH_COMMIT_ONCE=1 ./bench_random_mixed_hakmem 1000000 400 1
|
||||||
|
# Expected: Log "[FREE_COMMIT_ONCE] FAIL-FAST: HAKMEM_TINY_LARSON_FIX=1 incompatible"
|
||||||
|
|
||||||
|
# Test 2: Invalid route scenario (manually inject via debugging)
|
||||||
|
# Expected: Log "[FREE_COMMIT_ONCE] FAIL-FAST: C4 route=X not LEGACY"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5.3 Performance Profile
|
||||||
|
|
||||||
|
Optional (if time permits):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Perf stat comparison
|
||||||
|
HAKMEM_FREE_PATH_COMMIT_ONCE=0 perf stat -e branches,branch-misses ./bench_random_mixed_hakmem 20000000 400 1
|
||||||
|
HAKMEM_FREE_PATH_COMMIT_ONCE=1 perf stat -e branches,branch-misses ./bench_random_mixed_hakmem 20000000 400 1
|
||||||
|
```
|
||||||
|
|
||||||
|
**Expected**: 8-12% reduction in branches, <1% change in branch misses
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Rollback Strategy
|
||||||
|
|
||||||
|
### Immediate Rollback (No Recompile)
|
||||||
|
```bash
|
||||||
|
export HAKMEM_FREE_PATH_COMMIT_ONCE=0
|
||||||
|
```
|
||||||
|
|
||||||
|
### Box Removal (Recompile)
|
||||||
|
```bash
|
||||||
|
make clean
|
||||||
|
BOX_FREE_PATH_COMMIT_ONCE_FIXED=0 make bench_random_mixed_hakmem
|
||||||
|
```
|
||||||
|
|
||||||
|
### File Reversions
|
||||||
|
- Remove: `core/box/free_path_commit_once_fixed_box.{h,c}`
|
||||||
|
- Revert: `core/front/malloc_tiny_fast.h` (remove Phase 85 block)
|
||||||
|
- Revert: `core/bench_profile.h` (remove refresh call)
|
||||||
|
- Revert: `Makefile` (remove box flag)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Expected Results
|
||||||
|
|
||||||
|
### 7.1 Performance Target
|
||||||
|
|
||||||
|
| Metric | Control | Treatment | Delta | Status |
|
||||||
|
|--------|---------|-----------|-------|--------|
|
||||||
|
| Mean (M ops/s) | 55.53 | 56.64+ | +2.0%+ | GO threshold |
|
||||||
|
| CV (%) | 1.5-2.0 | 1.5-2.0 | stable | required |
|
||||||
|
| Branch reduction | baseline | -8-12% | ~10% | expected |
|
||||||
|
|
||||||
|
### 7.2 GO/NO-GO Decision
|
||||||
|
|
||||||
|
**GO if**:
|
||||||
|
- Treatment mean ≥ 56.64M ops/s (+2.0%)
|
||||||
|
- CV remains stable (<3%)
|
||||||
|
- No regressions in other scenarios (json/mir/vm)
|
||||||
|
- Fail-fast tests pass
|
||||||
|
|
||||||
|
**NO-GO if**:
|
||||||
|
- Treatment mean < 56.64M ops/s
|
||||||
|
- CV increases significantly (>3%)
|
||||||
|
- Regressions observed
|
||||||
|
- Fail-fast mechanisms fail
|
||||||
|
|
||||||
|
### 7.3 Risk Assessment
|
||||||
|
|
||||||
|
**Low Risk**:
|
||||||
|
- Scope limited to LEGACY route (C4-C7, 129-256 bytes)
|
||||||
|
- ENV gate allows instant rollback
|
||||||
|
- Fail-fast for LARSON_FIX ensures safety
|
||||||
|
- Phase 9/10 MONO optimizations unaffected (fall through on cache miss)
|
||||||
|
|
||||||
|
**Potential Issues**:
|
||||||
|
- Layout tax: New code path may cause I-cache/register pressure (mitigated by early placement at line ~950)
|
||||||
|
- Indirect call overhead: Cached function pointer may have misprediction cost (likely negligible vs branch reduction)
|
||||||
|
- Route dynamics: If route changes at runtime (unlikely), commit-once becomes stale (requires bench_profile refresh)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Success Criteria Summary
|
||||||
|
|
||||||
|
1. ✅ Build completes without errors
|
||||||
|
2. ✅ Fail-fast tests pass (LARSON_FIX=1, invalid route)
|
||||||
|
3. ✅ SSOT 10-run treatment ≥ 56.64M ops/s (+2.0%)
|
||||||
|
4. ✅ CV remains stable (<3%)
|
||||||
|
5. ✅ No regressions in other scenarios
|
||||||
|
|
||||||
|
**If all criteria met**: Merge to master, update CURRENT_TASK.md, record in PERFORMANCE_TARGETS_SCORECARD.md
|
||||||
|
|
||||||
|
**If NO-GO**: Keep as research box, document findings, archive plan.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9. References
|
||||||
|
|
||||||
|
- Phase 78-1 pattern: `core/box/tiny_inline_slots_fixed_mode_box.{h,c}`
|
||||||
|
- Free path implementation: `core/front/malloc_tiny_fast.h:919-1221`
|
||||||
|
- LARSON_FIX constraint: `core/box/tiny_larson_fix_env_box.h`
|
||||||
|
- Route snapshot: `core/hakmem_tiny.c:64-65` (g_tiny_route_class, g_tiny_route_snapshot_done)
|
||||||
|
- SSOT validation: `scripts/run_mixed_10_cleanenv.sh`
|
||||||
68
docs/analysis/PHASE85_FREE_PATH_COMMIT_ONCE_RESULTS.md
Normal file
68
docs/analysis/PHASE85_FREE_PATH_COMMIT_ONCE_RESULTS.md
Normal file
@ -0,0 +1,68 @@
|
|||||||
|
# Phase 85: Free Path Commit-Once (LEGACY-only) — Results
|
||||||
|
|
||||||
|
## Goal
|
||||||
|
|
||||||
|
`free_tiny_fast()` の free path で、**LEGACY に戻るまでの「儀式」(mono/policy/route 計算)**を、
|
||||||
|
bench_profile 境界で commit-once して **hot path から除去**する。
|
||||||
|
|
||||||
|
- Scope: C4–C7 の **LEGACY route のみ**
|
||||||
|
- Reversible: `HAKMEM_FREE_PATH_COMMIT_ONCE=0/1`
|
||||||
|
- Safety: `HAKMEM_TINY_LARSON_FIX=1` なら fail-fast で commit 無効
|
||||||
|
|
||||||
|
## Implementation
|
||||||
|
|
||||||
|
- New box:
|
||||||
|
- `core/box/free_path_commit_once_fixed_box.h`
|
||||||
|
- `core/box/free_path_commit_once_fixed_box.c`
|
||||||
|
- Integration:
|
||||||
|
- `core/bench_profile.h` から `free_path_commit_once_refresh_from_env()` を呼ぶ
|
||||||
|
- `core/front/malloc_tiny_fast.h` の `free_tiny_fast()` で Phase 9/10 より前に早期ハンドラ dispatch
|
||||||
|
- Build:
|
||||||
|
- `Makefile` に `core/box/free_path_commit_once_fixed_box.o` を追加
|
||||||
|
|
||||||
|
## A/B Results (SSOT, 10-run)
|
||||||
|
|
||||||
|
Control (`HAKMEM_FREE_PATH_COMMIT_ONCE=0`)
|
||||||
|
- Mean: 52.75M ops/s
|
||||||
|
- Median: 52.94M ops/s
|
||||||
|
- Min: 51.70M ops/s
|
||||||
|
- Max: 53.77M ops/s
|
||||||
|
|
||||||
|
Treatment (`HAKMEM_FREE_PATH_COMMIT_ONCE=1`)
|
||||||
|
- Mean: 52.30M ops/s
|
||||||
|
- Median: 52.42M ops/s
|
||||||
|
- Min: 51.04M ops/s
|
||||||
|
- Max: 53.03M ops/s
|
||||||
|
|
||||||
|
Delta: **-0.86% (NO-GO)**
|
||||||
|
|
||||||
|
## Diagnosis
|
||||||
|
|
||||||
|
### 1) Phase 10 (MONO LEGACY DIRECT) と最適化内容が被る
|
||||||
|
|
||||||
|
既に `free_tiny_fast_mono_legacy_direct_enabled()` が **C4–C7 の直行**(policy snapshot をスキップ)を提供しているため、
|
||||||
|
Phase 85 が「追加で消せる儀式」が薄かった。
|
||||||
|
|
||||||
|
結果として、Phase 85 は **追加の gate/table 参照**を持ち込み、プラスになりにくい。
|
||||||
|
|
||||||
|
### 2) function pointer dispatch の税
|
||||||
|
|
||||||
|
Phase 85 は `entry->handler(base, class_idx, env)` の **間接呼び出し**を導入している。
|
||||||
|
この種の間接分岐は branch predictor / layout の影響を受けやすく、SSOTでは net で負ける可能性がある。
|
||||||
|
|
||||||
|
### 3) layout tax の可能性
|
||||||
|
|
||||||
|
free hot path (`free_tiny_fast`) へ新規コードを挿入したことで text layout が揺れ、
|
||||||
|
-0.x% の符号反転が起きやすい(既知パターン)。
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
- **NO-GO**: `HAKMEM_FREE_PATH_COMMIT_ONCE` は **default OFF の research box**として保持
|
||||||
|
- 物理削除はしない(layout tax の符号反転を避けるため)
|
||||||
|
|
||||||
|
## Follow-ups (if revisiting)
|
||||||
|
|
||||||
|
1. Handler cache をやめ、commit-once は **bitmask (legacy_mask) のみ**にする(間接 call 排除)。
|
||||||
|
2. `env snapshot` を hot path で取る前に exit できる形を維持し、hot 側は **1本の早期return**に留める。
|
||||||
|
3. “置換”は Phase 9/10 を compile-out できる条件が揃った後に Phase 86 で検討(同一バイナリ A/B を優先)。
|
||||||
|
|
||||||
@ -39,3 +39,11 @@
|
|||||||
- 研究箱を“削除”するのは、次の条件を満たしたときだけ:
|
- 研究箱を“削除”するのは、次の条件を満たしたときだけ:
|
||||||
- (1) 少なくとも 2週間以上使っていない、(2) SSOT/bench_profile/cleanenv が参照していない、
|
- (1) 少なくとも 2週間以上使っていない、(2) SSOT/bench_profile/cleanenv が参照していない、
|
||||||
(3) 同一バイナリ A/B で削除しても性能が変わらない(layout tax 無い)ことを確認した。
|
(3) 同一バイナリ A/B で削除しても性能が変わらない(layout tax 無い)ことを確認した。
|
||||||
|
|
||||||
|
## 外部相談のSSOT(貼り付けパケット)
|
||||||
|
|
||||||
|
凍結箱が増えてくると「どの経路を踏んでるか」が外部に説明しづらくなるので、
|
||||||
|
レビュー依頼は “圧縮パケット” を正として使う:
|
||||||
|
|
||||||
|
- 生成: `scripts/make_chatgpt_pro_packet_free_path.sh`
|
||||||
|
- スナップショット: `docs/analysis/FREE_PATH_REVIEW_PACKET_CHATGPT.md`
|
||||||
|
|||||||
4
hakmem.d
4
hakmem.d
@ -198,6 +198,8 @@ hakmem.o: core/hakmem.c core/hakmem.h core/hakmem_build_flags.h \
|
|||||||
core/box/../front/../box/free_cold_shape_stats_box.h \
|
core/box/../front/../box/free_cold_shape_stats_box.h \
|
||||||
core/box/../front/../box/free_tiny_fast_mono_dualhot_env_box.h \
|
core/box/../front/../box/free_tiny_fast_mono_dualhot_env_box.h \
|
||||||
core/box/../front/../box/free_tiny_fast_mono_legacy_direct_env_box.h \
|
core/box/../front/../box/free_tiny_fast_mono_legacy_direct_env_box.h \
|
||||||
|
core/box/../front/../box/free_path_commit_once_fixed_box.h \
|
||||||
|
core/box/../front/../box/free_path_legacy_mask_box.h \
|
||||||
core/box/../front/../box/alloc_passdown_ssot_env_box.h \
|
core/box/../front/../box/alloc_passdown_ssot_env_box.h \
|
||||||
core/box/tiny_alloc_gate_box.h core/box/tiny_route_box.h \
|
core/box/tiny_alloc_gate_box.h core/box/tiny_route_box.h \
|
||||||
core/box/tiny_alloc_gate_shape_env_box.h \
|
core/box/tiny_alloc_gate_shape_env_box.h \
|
||||||
@ -489,6 +491,8 @@ core/box/../front/../box/free_cold_shape_env_box.h:
|
|||||||
core/box/../front/../box/free_cold_shape_stats_box.h:
|
core/box/../front/../box/free_cold_shape_stats_box.h:
|
||||||
core/box/../front/../box/free_tiny_fast_mono_dualhot_env_box.h:
|
core/box/../front/../box/free_tiny_fast_mono_dualhot_env_box.h:
|
||||||
core/box/../front/../box/free_tiny_fast_mono_legacy_direct_env_box.h:
|
core/box/../front/../box/free_tiny_fast_mono_legacy_direct_env_box.h:
|
||||||
|
core/box/../front/../box/free_path_commit_once_fixed_box.h:
|
||||||
|
core/box/../front/../box/free_path_legacy_mask_box.h:
|
||||||
core/box/../front/../box/alloc_passdown_ssot_env_box.h:
|
core/box/../front/../box/alloc_passdown_ssot_env_box.h:
|
||||||
core/box/tiny_alloc_gate_box.h:
|
core/box/tiny_alloc_gate_box.h:
|
||||||
core/box/tiny_route_box.h:
|
core/box/tiny_route_box.h:
|
||||||
|
|||||||
127
scripts/make_chatgpt_pro_packet_free_path.sh
Executable file
127
scripts/make_chatgpt_pro_packet_free_path.sh
Executable file
@ -0,0 +1,127 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
# Generate a compact "free-path review packet" for sharing with ChatGPT Pro.
|
||||||
|
# Output: Markdown to stdout (copy/paste).
|
||||||
|
#
|
||||||
|
# Usage:
|
||||||
|
# scripts/make_chatgpt_pro_packet_free_path.sh > /tmp/free_path_packet.md
|
||||||
|
#
|
||||||
|
# Notes:
|
||||||
|
# - Extracts key functions with a simple brace counter.
|
||||||
|
# - Clips each snippet to keep it shareable.
|
||||||
|
|
||||||
|
root_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
|
||||||
|
cd "${root_dir}"
|
||||||
|
|
||||||
|
# Default clip is intentionally small; you can override via CLIP_LINES=...
|
||||||
|
clip="${CLIP_LINES:-160}"
|
||||||
|
|
||||||
|
need() { command -v "$1" >/dev/null 2>&1 || { echo "[packet] missing $1" >&2; exit 1; }; }
|
||||||
|
need awk
|
||||||
|
need sed
|
||||||
|
|
||||||
|
extract_func_n_clip() {
|
||||||
|
local file="$1"
|
||||||
|
local re="$2"
|
||||||
|
local nth="$3"
|
||||||
|
local clip_lines="$4"
|
||||||
|
|
||||||
|
awk -v re="${re}" -v nth="${nth}" '
|
||||||
|
function count_char(s, c, i,n) { n=0; for (i=1;i<=length(s);i++) if (substr(s,i,1)==c) n++; return n }
|
||||||
|
BEGIN { hit=0; started=0; depth=0; seen_open=0 }
|
||||||
|
{
|
||||||
|
if (!started) {
|
||||||
|
if ($0 ~ re) {
|
||||||
|
hit++;
|
||||||
|
if (hit == nth) {
|
||||||
|
started=1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (started) {
|
||||||
|
print $0;
|
||||||
|
depth += count_char($0, "{");
|
||||||
|
if (count_char($0, "{") > 0) seen_open=1;
|
||||||
|
depth -= count_char($0, "}");
|
||||||
|
if (seen_open && depth <= 0) exit 0;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
' "${file}" | sed -n "1,${clip_lines}p"
|
||||||
|
}
|
||||||
|
|
||||||
|
extract_func() {
|
||||||
|
extract_func_n_clip "$1" "$2" 1 "${clip}"
|
||||||
|
}
|
||||||
|
|
||||||
|
md_code() {
|
||||||
|
local lang="$1"
|
||||||
|
local file="$2"
|
||||||
|
echo ""
|
||||||
|
echo "### \`${file}\`"
|
||||||
|
echo "\`\`\`${lang}"
|
||||||
|
cat
|
||||||
|
echo "\`\`\`"
|
||||||
|
}
|
||||||
|
|
||||||
|
cat <<'MD'
|
||||||
|
# Hakmem free-path review packet (compact)
|
||||||
|
|
||||||
|
Goal: understand remaining fixed costs vs mimalloc/tcmalloc, with Box Theory (single boundary, reversible ENV gates).
|
||||||
|
|
||||||
|
SSOT bench conditions (current practice):
|
||||||
|
- `HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE`
|
||||||
|
- `ITERS=20000000 WS=400 RUNS=10`
|
||||||
|
- run via `scripts/run_mixed_10_cleanenv.sh`
|
||||||
|
|
||||||
|
Request:
|
||||||
|
1) Where is the dominant fixed cost on free path now?
|
||||||
|
2) What structural change would give +5–10% without breaking Box Theory?
|
||||||
|
3) What NOT to do (layout tax pitfalls)?
|
||||||
|
MD
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
echo "## Code excerpts (clipped)"
|
||||||
|
|
||||||
|
# We focus on the hot tiny-free pipeline (the most actionable for instruction/branch work).
|
||||||
|
# If the reviewer needs wrapper/registry code too, we can provide a larger packet.
|
||||||
|
|
||||||
|
# A) tiny_free_gate_try_fast(): user_ptr -> class_idx/base -> tiny_hot_free_fast()/fallback
|
||||||
|
extract_func core/box/tiny_free_gate_box.h '^static inline int tiny_free_gate_try_fast\\(void\\* user_ptr\\)' | md_code c core/box/tiny_free_gate_box.h
|
||||||
|
|
||||||
|
# B) free_tiny_fast(): main Tiny free dispatcher (hot/cold + env snapshot)
|
||||||
|
extract_func_n_clip core/front/malloc_tiny_fast.h '^static inline int free_tiny_fast\\(void\\* ptr\\)' 1 220 | md_code c core/front/malloc_tiny_fast.h
|
||||||
|
|
||||||
|
# C) tiny_hot_free_fast(): TLS unified cache push
|
||||||
|
extract_func core/box/tiny_front_hot_box.h '^static inline int tiny_hot_free_fast\\(int class_idx, void\\* base\\)' | md_code c core/box/tiny_front_hot_box.h
|
||||||
|
|
||||||
|
# D) tiny_legacy_fallback_free_base_with_env(): inline-slots cascade + unified_cache_push(_fast)
|
||||||
|
extract_func_n_clip core/box/tiny_legacy_fallback_box.h '^static inline void tiny_legacy_fallback_free_base_with_env\\(void\\* base, uint32_t class_idx, const HakmemEnvSnapshot\\* env\\)' 1 260 | md_code c core/box/tiny_legacy_fallback_box.h
|
||||||
|
|
||||||
|
cat <<'MD'
|
||||||
|
|
||||||
|
## Questions to answer (please be concrete)
|
||||||
|
|
||||||
|
1) In these snippets, which checks/branches are still "per-op fixed taxes" on the hot free path?
|
||||||
|
- Please point to specific lines/conditions and estimate cost (branches/instructions or dependency chain).
|
||||||
|
|
||||||
|
2) Is `tiny_hot_free_fast()` already close to optimal, and the real bottleneck is upstream (user->base/classify/route)?
|
||||||
|
- If yes, what’s the smallest structural refactor that removes that upstream fixed tax?
|
||||||
|
|
||||||
|
3) Should we introduce a "commit once" plan (freeze the chosen free path) — or is branch prediction already making lazy-init checks ~free here?
|
||||||
|
- If "commit once", where should it live to avoid runtime gate overhead (bench_profile refresh boundary vs per-op)?
|
||||||
|
|
||||||
|
4) We have had many layout-tax regressions from code removal/reordering.
|
||||||
|
- What patterns here are most likely to trigger layout tax if changed?
|
||||||
|
- How would you stage a safe A/B (same binary, ENV toggle) for your proposal?
|
||||||
|
|
||||||
|
5) If you could change just ONE of:
|
||||||
|
- pointer classification to base/class_idx,
|
||||||
|
- route determination,
|
||||||
|
- unified cache push/pop structure,
|
||||||
|
which is highest ROI for +5–10% on WS=400?
|
||||||
|
|
||||||
|
MD
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
echo "[packet] done"
|
||||||
Reference in New Issue
Block a user