From e4c5f05355281be909e13eccbb82bcd5e00071ed Mon Sep 17 00:00:00 2001 From: "Moe Charm (CI)" Date: Thu, 18 Dec 2025 22:05:34 +0900 Subject: [PATCH] Phase 86: Free Path Legacy Mask (NO-GO, +0.25%) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Summary Implemented Phase 86 "mask-only commit" optimization for free path: - Bitset mask (0x7f for C0-C6) to identify LEGACY classes - Direct call to tiny_legacy_fallback_free_base_with_env() - No indirect function pointers (avoids Phase 85's -0.86% regression) - Fail-fast on LARSON_FIX=1 (cross-thread validation incompatibility) ## Results (10-run SSOT) **NO-GO**: +0.25% improvement (threshold: +1.0%) - Control: 51,750,467 ops/s (CV: 2.26%) - Treatment: 51,881,055 ops/s (CV: 2.32%) - Delta: +0.25% (mean), -0.15% (median) ## Root Cause Competing optimizations plateau: 1. Phase 9/10 MONO LEGACY (+1.89%) already capture most free path benefit 2. Remaining margin insufficient to overcome: - Two branch checks (mask_enabled + has_class) - I-cache layout tax in hot path - Direct function call overhead ## Phase 85 vs Phase 86 | Metric | Phase 85 | Phase 86 | |--------|----------|----------| | Approach | Indirect calls + table | Bitset mask + direct call | | Result | -0.86% | +0.25% | | Verdict | NO-GO (regression) | NO-GO (insufficient) | Phase 86 correctly avoided indirect call penalties but revealed architectural limit: can't escape Phase 9/10 overlay without restructuring. ## Recommendation Free path optimization layer has reached practical ceiling: - Phase 9/10 +1.89% + Phase 6/19/FASTLANE +16-27% โ‰ˆ 18-29% total - Further attempts on ceremony elimination face same constraints - Recommend focus on different optimization layers (malloc, etc.) ## Files Changed ### New - core/box/free_path_legacy_mask_box.h (API + globals) - core/box/free_path_legacy_mask_box.c (refresh logic) ### Modified - core/bench_profile.h (added refresh call) - core/front/malloc_tiny_fast.h (added Phase 86 fast path check) - Makefile (added object files) - CURRENT_TASK.md (documented result) All changes conditional on HAKMEM_FREE_PATH_LEGACY_MASK=1 (default OFF). ๐Ÿค– Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 --- CURRENT_TASK.md | 23 + Makefile | 6 +- core/bench_profile.h | 6 + core/box/free_path_commit_once_fixed_box.c | 105 ++++ core/box/free_path_commit_once_fixed_box.h | 49 ++ core/box/free_path_legacy_mask_box.c | 88 +++ core/box/free_path_legacy_mask_box.h | 46 ++ core/front/malloc_tiny_fast.h | 35 ++ docs/analysis/BENCH_REPRODUCIBILITY_SSOT.md | 7 + .../FREE_PATH_REVIEW_PACKET_CHATGPT.md | 555 ++++++++++++++++++ .../PHASE85_FREE_PATH_COMMIT_ONCE_PLAN.md | 394 +++++++++++++ .../PHASE85_FREE_PATH_COMMIT_ONCE_RESULTS.md | 68 +++ docs/analysis/RESEARCH_BOXES_SSOT.md | 8 + hakmem.d | 4 + scripts/make_chatgpt_pro_packet_free_path.sh | 127 ++++ 15 files changed, 1518 insertions(+), 3 deletions(-) create mode 100644 core/box/free_path_commit_once_fixed_box.c create mode 100644 core/box/free_path_commit_once_fixed_box.h create mode 100644 core/box/free_path_legacy_mask_box.c create mode 100644 core/box/free_path_legacy_mask_box.h create mode 100644 docs/analysis/FREE_PATH_REVIEW_PACKET_CHATGPT.md create mode 100644 docs/analysis/PHASE85_FREE_PATH_COMMIT_ONCE_PLAN.md create mode 100644 docs/analysis/PHASE85_FREE_PATH_COMMIT_ONCE_RESULTS.md create mode 100755 scripts/make_chatgpt_pro_packet_free_path.sh diff --git a/CURRENT_TASK.md b/CURRENT_TASK.md index f568345f..c17749f5 100644 --- a/CURRENT_TASK.md +++ b/CURRENT_TASK.md @@ -1,5 +1,22 @@ # CURRENT_TASK๏ผˆRolling, SSOT๏ผ‰ +## Phase 86๏ผˆ็ต‚ไบ†: NO-GO๏ผ‰ + +**Status**: โŒ NO-GO (+0.25% improvement, threshold: +1.0%) + +**A/B Test (10-run SSOT)**: +- Control: 51,750,467 ops/s (CV: 2.26%) +- Treatment: 51,881,055 ops/s (CV: 2.32%) +- Delta: +0.25% (mean), -0.15% (median) + +**Summary**: Free path legacy mask (mask-only) optimization for LEGACY classes. +- Design: Bitset mask + direct call (avoids Phase 85's indirect call problems) +- Implementation: Correct (0x7f mask computed, C0-C6 optimized) +- Root cause: Competing Phase 9/10 optimizations (+1.89%) already capture most benefit +- Conclusion: Free path optimization layer has reached practical ceiling + +--- + ## 0) ไปŠใฎใ€Œๆญฃใ€๏ผˆSSOT๏ผ‰ - **ๆ€ง่ƒฝๆฏ”่ผƒใฎๆญฃ**: FAST PGO build๏ผˆ`make pgo-fast-full` โ†’ `bench_random_mixed_hakmem_minimal_pgo`๏ผ‰๏ผ‹ **WarmPool=16** @@ -29,6 +46,7 @@ - ๅ†็พใƒญใ‚ฐใ‚’ๆฎ‹ใ™๏ผˆๆ•ฐ%ใ‚’่ฉฐใ‚ใ‚‹ใจใใฎๆœ€ไฝŽ้™๏ผ‰: - `scripts/bench_ssot_capture.sh` - `HAKMEM_BENCH_ENV_LOG=1`๏ผˆCPU governor/EPP/freq ใ‚’่จ˜้Œฒ๏ผ‰ + - ๅค–้ƒจ็›ธ่ซ‡๏ผˆ่ฒผใ‚Šไป˜ใ‘ใƒ‘ใ‚ฑใƒƒใƒˆ๏ผ‰: `docs/analysis/FREE_PATH_REVIEW_PACKET_CHATGPT.md`๏ผˆ็”Ÿๆˆ: `scripts/make_chatgpt_pro_packet_free_path.sh`๏ผ‰ ## 0b) Allocatorๆฏ”่ผƒ๏ผˆreference๏ผ‰ @@ -87,6 +105,11 @@ - **Phase 82๏ผˆhardening๏ผ‰**: hot path ใ‹ใ‚‰ C2 local cache ใ‚’ๅฎŒๅ…จ้™คๅค–๏ผˆ็’ฐๅขƒๅค‰ๆ•ฐใ‚’็ซ‹ใฆใฆใ‚‚ alloc/free hot ใงใฏ่ธใพใชใ„๏ผ‰ - ่จ˜้Œฒ: `docs/analysis/PHASE82_C2_LOCAL_CACHE_HOTPATH_EXCLUSION.md` +- **Phase 85๏ผˆFree path commit-once, LEGACY-only๏ผ‰**: `HAKMEM_FREE_PATH_COMMIT_ONCE=0/1` + - ็ตๆžœ: **NO-GO๏ผˆ-0.86%๏ผ‰** โ†’ **research box freeze๏ผˆdefault OFF๏ผ‰** + - ็†็”ฑ: Phase 10๏ผˆMONO LEGACY DIRECT๏ผ‰ใจๅŠนๆžœใŒ่ขซใ‚Šใ€ใ•ใ‚‰ใซ้–“ๆŽฅๅ‘ผใณๅ‡บใ—/้…็ฝฎใฎ็จŽใŒๅข—ใˆใŸ + - ่จ˜้Œฒ: `docs/analysis/PHASE85_FREE_PATH_COMMIT_ONCE_RESULTS.md` + ## 4) ๆฌกใฎๆŒ‡็คบๆ›ธ๏ผˆActive๏ผ‰ ### Phase 74๏ผˆๆง‹้€ ๏ผ‰: UnifiedCache hit-path ใ‚’็Ÿญใใ™ใ‚‹ โœ… **P1 (LOCALIZE) ๅ‡็ต** diff --git a/Makefile b/Makefile index 543b9597..ed21c848 100644 --- a/Makefile +++ b/Makefile @@ -253,7 +253,7 @@ LDFLAGS += $(EXTRA_LDFLAGS) # Targets TARGET = test_hakmem -OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o core/box/ss_release_policy_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/box/tiny_tcache_env_box.o core/box/tiny_unified_lifo_env_box.o core/box/front_fastlane_alloc_legacy_direct_env_box.o core/box/fastlane_direct_env_box.o core/box/tiny_header_hotfull_env_box.o core/box/tiny_inline_slots_fixed_mode_box.o core/box/tiny_inline_slots_switch_dispatch_fixed_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/tiny_c6_inline_slots.o core/tiny_c5_inline_slots.o core/tiny_c2_local_cache.o core/tiny_c3_inline_slots.o core/tiny_c4_inline_slots.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o core/box/small_policy_snapshot_tls_box.o +OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o core/box/ss_release_policy_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/box/tiny_tcache_env_box.o core/box/tiny_unified_lifo_env_box.o core/box/front_fastlane_alloc_legacy_direct_env_box.o core/box/fastlane_direct_env_box.o core/box/tiny_header_hotfull_env_box.o core/box/tiny_inline_slots_fixed_mode_box.o core/box/tiny_inline_slots_switch_dispatch_fixed_box.o core/box/free_path_commit_once_fixed_box.o core/box/free_path_legacy_mask_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/tiny_c6_inline_slots.o core/tiny_c5_inline_slots.o core/tiny_c2_local_cache.o core/tiny_c3_inline_slots.o core/tiny_c4_inline_slots.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o core/box/small_policy_snapshot_tls_box.o OBJS = $(OBJS_BASE) # Shared library @@ -287,7 +287,7 @@ endif # Benchmark targets BENCH_HAKMEM = bench_allocators_hakmem BENCH_SYSTEM = bench_allocators_system -BENCH_HAKMEM_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o core/box/ss_release_policy_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/fastlane_direct_env_box.o core/box/tiny_inline_slots_fixed_mode_box.o core/box/tiny_inline_slots_switch_dispatch_fixed_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/tiny_c6_inline_slots.o core/tiny_c5_inline_slots.o core/tiny_c2_local_cache.o core/tiny_c3_inline_slots.o core/tiny_c4_inline_slots.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o core/box/small_policy_snapshot_tls_box.o bench_allocators_hakmem.o +BENCH_HAKMEM_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o core/box/ss_release_policy_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/fastlane_direct_env_box.o core/box/tiny_inline_slots_fixed_mode_box.o core/box/tiny_inline_slots_switch_dispatch_fixed_box.o core/box/free_path_commit_once_fixed_box.o core/box/free_path_legacy_mask_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/tiny_c6_inline_slots.o core/tiny_c5_inline_slots.o core/tiny_c2_local_cache.o core/tiny_c3_inline_slots.o core/tiny_c4_inline_slots.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o core/box/small_policy_snapshot_tls_box.o bench_allocators_hakmem.o BENCH_HAKMEM_OBJS = $(BENCH_HAKMEM_OBJS_BASE) ifeq ($(POOL_TLS_PHASE1),1) BENCH_HAKMEM_OBJS += pool_tls.o pool_refill.o pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o @@ -464,7 +464,7 @@ test-box-refactor: box-refactor ./larson_hakmem 10 8 128 1024 1 12345 4 # Phase 4: Tiny Pool benchmarks (properly linked with hakmem) -TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o core/box/ss_release_policy_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/box/tiny_tcache_env_box.o core/box/tiny_unified_lifo_env_box.o core/box/front_fastlane_alloc_legacy_direct_env_box.o core/box/fastlane_direct_env_box.o core/box/tiny_header_hotfull_env_box.o core/box/tiny_inline_slots_fixed_mode_box.o core/box/tiny_inline_slots_switch_dispatch_fixed_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/tiny_c6_inline_slots.o core/tiny_c5_inline_slots.o core/tiny_c2_local_cache.o core/tiny_c3_inline_slots.o core/tiny_c4_inline_slots.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o core/box/small_policy_snapshot_tls_box.o +TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o core/box/ss_release_policy_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/box/tiny_tcache_env_box.o core/box/tiny_unified_lifo_env_box.o core/box/front_fastlane_alloc_legacy_direct_env_box.o core/box/fastlane_direct_env_box.o core/box/tiny_header_hotfull_env_box.o core/box/tiny_inline_slots_fixed_mode_box.o core/box/tiny_inline_slots_switch_dispatch_fixed_box.o core/box/free_path_commit_once_fixed_box.o core/box/free_path_legacy_mask_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/tiny_c6_inline_slots.o core/tiny_c5_inline_slots.o core/tiny_c2_local_cache.o core/tiny_c3_inline_slots.o core/tiny_c4_inline_slots.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o core/box/small_policy_snapshot_tls_box.o TINY_BENCH_OBJS = $(TINY_BENCH_OBJS_BASE) ifeq ($(POOL_TLS_PHASE1),1) TINY_BENCH_OBJS += pool_tls.o pool_refill.o core/pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o diff --git a/core/bench_profile.h b/core/bench_profile.h index 665490b9..78a4a82c 100644 --- a/core/bench_profile.h +++ b/core/bench_profile.h @@ -17,6 +17,8 @@ #include "box/fastlane_direct_env_box.h" // fastlane_direct_env_refresh_from_env (Phase 19-1) #include "box/tiny_header_hotfull_env_box.h" // tiny_header_hotfull_env_refresh_from_env (Phase 21) #include "box/tiny_inline_slots_fixed_mode_box.h" // tiny_inline_slots_fixed_mode_refresh_from_env (Phase 78-1) +#include "box/free_path_commit_once_fixed_box.h" // free_path_commit_once_refresh_from_env (Phase 85) +#include "box/free_path_legacy_mask_box.h" // free_path_legacy_mask_refresh_from_env (Phase 86) #endif // env ใŒๆœช่จญๅฎšใฎใจใใ ใ‘ๆ—ขๅฎšๅ€คใ‚’ๅ…ฅใ‚Œใ‚‹ @@ -235,5 +237,9 @@ static inline void bench_apply_profile(void) { tiny_header_hotfull_env_refresh_from_env(); // Phase 78-1: Optionally pin C3/C4/C5/C6 inline-slots modes (avoid per-op ENV gates). tiny_inline_slots_fixed_mode_refresh_from_env(); + // Phase 85: Optionally commit-once for C4-C7 LEGACY free path (skip policy/route/mono ceremony). + free_path_commit_once_refresh_from_env(); + // Phase 86: Optionally use legacy mask for early exit (no indirect calls, just bit test). + free_path_legacy_mask_refresh_from_env(); #endif } diff --git a/core/box/free_path_commit_once_fixed_box.c b/core/box/free_path_commit_once_fixed_box.c new file mode 100644 index 00000000..5cfd9d6d --- /dev/null +++ b/core/box/free_path_commit_once_fixed_box.c @@ -0,0 +1,105 @@ +// free_path_commit_once_fixed_box.c - Phase 85: Free Path Commit-Once (LEGACY-only) + +#include "free_path_commit_once_fixed_box.h" + +#include +#include +#include "tiny_route_env_box.h" +#include "free_policy_fast_v2_box.h" +#include "tiny_legacy_fallback_box.h" +#include "hakmem_build_flags.h" + +#define TINY_C4 4 +#define TINY_C7 7 + +// ============================================================================ +// Global state +// ============================================================================ + +uint8_t g_free_path_commit_once_enabled = 0; +struct FreePatchCommitOnceEntry g_free_path_commit_once_entries[4] = {0}; + +// ============================================================================ +// Refresh from ENV (called by bench_profile) +// ============================================================================ + +void free_path_commit_once_refresh_from_env(void) { + // 1. Read master ENV gate + const char* env_val = getenv("HAKMEM_FREE_PATH_COMMIT_ONCE"); + int requested = (env_val && *env_val && *env_val != '0') ? 1 : 0; + + if (!requested) { + g_free_path_commit_once_enabled = 0; + return; + } + + // 2. Fail-fast: LARSON_FIX incompatible with commit-once + // owner_tid validation must happen on every free, cannot commit-once + const char* larson_env = getenv("HAKMEM_TINY_LARSON_FIX"); + int larson_fix_enabled = (larson_env && *larson_env && *larson_env != '0') ? 1 : 0; + + if (larson_fix_enabled) { +#if !HAKMEM_BUILD_RELEASE + fprintf(stderr, "[FREE_PATH_COMMIT_ONCE] FAIL-FAST: HAKMEM_TINY_LARSON_FIX=1 incompatible, disabling\n"); + fflush(stderr); +#endif + g_free_path_commit_once_enabled = 0; + return; + } + + // 3. Ensure route snapshot is initialized + tiny_route_snapshot_init(); + + // 4. Get nonlegacy mask (classes that use ULTRA/MID/V7) + uint8_t nonlegacy_mask = free_policy_fast_v2_nonlegacy_mask(); + + // 5. For each C4-C7 class, determine if it can commit-once + // Commit-once is safe if: + // - Class is NOT in nonlegacy_mask (implies LEGACY route) + // - Route snapshot confirms TINY_ROUTE_LEGACY + for (int i = 0; i < 4; i++) { + unsigned class_idx = TINY_C4 + i; + struct FreePatchCommitOnceEntry* entry = &g_free_path_commit_once_entries[i]; + + // Initialize entry + entry->can_commit = 0; + entry->handler = NULL; + + // Check if class is in nonlegacy mask + if ((nonlegacy_mask & (1u << class_idx)) != 0) { + // Class uses non-legacy path (ULTRA/MID/V7) + continue; + } + + // Check route snapshot + tiny_route_kind_t route = tiny_route_for_class((uint8_t)class_idx); + if (route != TINY_ROUTE_LEGACY) { + // Unexpected route (should not happen if nonlegacy_mask is correct) +#if !HAKMEM_BUILD_RELEASE + fprintf(stderr, "[FREE_PATH_COMMIT_ONCE] FAIL-FAST: C%u route=%d not LEGACY, disabling\n", + class_idx, (int)route); + fflush(stderr); +#endif + g_free_path_commit_once_enabled = 0; + return; + } + + // Route is LEGACY and class not in nonlegacy_mask: safe to commit-once + entry->can_commit = 1; + entry->handler = tiny_legacy_fallback_free_base_with_env; + +#if !HAKMEM_BUILD_RELEASE + fprintf(stderr, "[FREE_PATH_COMMIT_ONCE] C%u committed (handler=%p)\n", + class_idx, (void*)entry->handler); + fflush(stderr); +#endif + } + + // 6. All checks passed, enable commit-once + g_free_path_commit_once_enabled = 1; + +#if !HAKMEM_BUILD_RELEASE + fprintf(stderr, "[FREE_PATH_COMMIT_ONCE] Enabled (nonlegacy_mask=0x%02x, LARSON_FIX=0)\n", nonlegacy_mask); + fflush(stderr); +#endif +} diff --git a/core/box/free_path_commit_once_fixed_box.h b/core/box/free_path_commit_once_fixed_box.h new file mode 100644 index 00000000..31c8bf69 --- /dev/null +++ b/core/box/free_path_commit_once_fixed_box.h @@ -0,0 +1,49 @@ +// free_path_commit_once_fixed_box.h - Phase 85: Free Path Commit-Once (LEGACY-only) +// +// Goal: Eliminate per-operation policy/route/mono ceremony overhead for C4-C7 LEGACY classes +// by pre-computing route+handler at init-time. +// +// Design (Box Theory, adapted from Phase 78-1): +// - Single boundary: bench_profile calls free_path_commit_once_refresh_from_env() +// after applying presets. +// - Cache: Pre-compute for each C4-C7 class whether it can use commit-once path +// (must be LEGACY route AND LARSON_FIX disabled) +// - Hot path: If commit-once enabled and class in commit set, skip Phase 9/10/policy/route +// ceremony and call handler directly. +// - Reversible: toggle HAKMEM_FREE_PATH_COMMIT_ONCE=0/1. +// +// Fail-fast: If HAKMEM_TINY_LARSON_FIX=1, disable commit-once (owner_tid validation +// incompatible with early exit). +// +// ENV: +// - HAKMEM_FREE_PATH_COMMIT_ONCE=0/1 (default 0) + +#ifndef HAK_BOX_FREE_PATH_COMMIT_ONCE_FIXED_BOX_H +#define HAK_BOX_FREE_PATH_COMMIT_ONCE_FIXED_BOX_H + +#include +#include "tiny_route_env_box.h" + +// Forward declaration: handler function pointer +typedef void (*FreeTinyHandler)(void* base, uint32_t class_idx, const struct HakmemEnvSnapshot* env); + +// Cached entry for a single class (C4-C7) +struct FreePatchCommitOnceEntry { + uint8_t can_commit; // 1 if this class can use commit-once, 0 otherwise + FreeTinyHandler handler; // Handler function pointer (if can_commit=1) +}; + +// Refresh (single boundary): bench_profile calls this after putenv defaults. +void free_path_commit_once_refresh_from_env(void); + +// Cached state (read in hot path). +extern uint8_t g_free_path_commit_once_enabled; +extern struct FreePatchCommitOnceEntry g_free_path_commit_once_entries[4]; // C4-C7 + +// Fast-path API (inlined) +__attribute__((always_inline)) +static inline int free_path_commit_once_enabled_fast(void) { + return (int)g_free_path_commit_once_enabled; +} + +#endif // HAK_BOX_FREE_PATH_COMMIT_ONCE_FIXED_BOX_H diff --git a/core/box/free_path_legacy_mask_box.c b/core/box/free_path_legacy_mask_box.c new file mode 100644 index 00000000..5cc4478c --- /dev/null +++ b/core/box/free_path_legacy_mask_box.c @@ -0,0 +1,88 @@ +// free_path_legacy_mask_box.c - Phase 86: Free Path Legacy Mask (mask-only) + +#include "free_path_legacy_mask_box.h" + +#include +#include +#include "tiny_route_env_box.h" +#include "free_policy_fast_v2_box.h" +#include "tiny_c7_ultra_box.h" +#include "hakmem_build_flags.h" + +#define TINY_C0 0 +#define TINY_C7 7 + +// ============================================================================ +// Global state +// ============================================================================ + +uint8_t g_free_legacy_mask_enabled = 0; +uint8_t g_free_legacy_mask = 0; + +// ============================================================================ +// Refresh from ENV (called by bench_profile) +// ============================================================================ + +void free_path_legacy_mask_refresh_from_env(void) { + // 1. Read master ENV gate + const char* env_val = getenv("HAKMEM_FREE_PATH_LEGACY_MASK"); + int requested = (env_val && *env_val && *env_val != '0') ? 1 : 0; + + if (!requested) { + g_free_legacy_mask_enabled = 0; + return; + } + + // 2. Fail-fast: LARSON_FIX incompatible + // owner_tid validation must happen on every free, cannot commit-once + const char* larson_env = getenv("HAKMEM_TINY_LARSON_FIX"); + int larson_fix_enabled = (larson_env && *larson_env && *larson_env != '0') ? 1 : 0; + + if (larson_fix_enabled) { +#if !HAKMEM_BUILD_RELEASE + fprintf(stderr, "[FREE_LEGACY_MASK] FAIL-FAST: HAKMEM_TINY_LARSON_FIX=1 incompatible, disabling\n"); + fflush(stderr); +#endif + g_free_legacy_mask_enabled = 0; + return; + } + + // 3. Ensure route snapshot is initialized + tiny_route_snapshot_init(); + + // 4. Get nonlegacy mask (classes that use ULTRA/MID/V7) + uint8_t nonlegacy_mask = free_policy_fast_v2_nonlegacy_mask(); + + // 5. Check if C7 ULTRA is enabled (special case: C7 has ULTRA fast path) + int c7_ultra_enabled = tiny_c7_ultra_enabled_env(); + + // 6. Compute legacy_mask: bit i = 1 if class i is LEGACY (not in nonlegacy_mask) + // and route confirms LEGACY + uint8_t mask = 0; + for (unsigned i = TINY_C0; i <= TINY_C7; i++) { + // Skip if class is in non-legacy mask (ULTRA/MID/V7 active) + if (nonlegacy_mask & (1u << i)) { + continue; + } + + // Skip if C7 and ULTRA is enabled (C7 ULTRA has dedicated fast path) + if (i == 7 && c7_ultra_enabled) { + continue; + } + + // Check route snapshot + tiny_route_kind_t route = tiny_route_for_class((uint8_t)i); + if (route == TINY_ROUTE_LEGACY) { + mask |= (1u << i); + } + } + + g_free_legacy_mask = mask; + g_free_legacy_mask_enabled = 1; + +#if !HAKMEM_BUILD_RELEASE + fprintf(stderr, "[FREE_LEGACY_MASK] enabled=1 mask=0x%02x nonlegacy=0x%02x c7_ultra=%d larson=0\n", + mask, nonlegacy_mask, c7_ultra_enabled); + fflush(stderr); +#endif +} diff --git a/core/box/free_path_legacy_mask_box.h b/core/box/free_path_legacy_mask_box.h new file mode 100644 index 00000000..6929d82a --- /dev/null +++ b/core/box/free_path_legacy_mask_box.h @@ -0,0 +1,46 @@ +// free_path_legacy_mask_box.h - Phase 86: Free Path Legacy Mask (mask-only, no indirect calls) +// +// Goal: Achieve Phase 10 effect (skip ceremony for LEGACY classes) with lower cost by: +// - Computing legacy_mask at init-time (bench_profile boundary) +// - Avoiding indirect call overhead (no function pointers) +// - Single direct call to tiny_legacy_fallback_free_base_with_env() +// - No table lookups in hot path (just bit test) +// +// Design (Box Theory): +// - Single boundary: bench_profile calls free_path_legacy_mask_refresh_from_env() +// after applying presets (putenv defaults). +// - Cache: legacy_mask (bitset, 1 bit per class C0-C7) +// - Hot path: If enabled and (mask & (1 << class_idx)), skip policy/route/mono ceremony +// and call tiny_legacy_fallback_free_base_with_env() directly. +// - Reversible: toggle HAKMEM_FREE_PATH_LEGACY_MASK=0/1. +// +// Fail-fast: If HAKMEM_TINY_LARSON_FIX=1, disable (cross-thread owner_tid validation needed). +// +// ENV: +// - HAKMEM_FREE_PATH_LEGACY_MASK=0/1 (default 0) + +#ifndef HAK_BOX_FREE_PATH_LEGACY_MASK_BOX_H +#define HAK_BOX_FREE_PATH_LEGACY_MASK_BOX_H + +#include + +// Refresh (single boundary): bench_profile calls this after putenv defaults. +void free_path_legacy_mask_refresh_from_env(void); + +// Cached state (read in hot path). +extern uint8_t g_free_legacy_mask_enabled; +extern uint8_t g_free_legacy_mask; // Bitset: bit i = 1 if class i is LEGACY and can skip ceremony + +// Fast-path API (inlined, no fallback needed). +__attribute__((always_inline)) +static inline int free_path_legacy_mask_enabled_fast(void) { + return (int)g_free_legacy_mask_enabled; +} + +__attribute__((always_inline)) +static inline int free_path_legacy_mask_has_class(unsigned class_idx) { + if (__builtin_expect(class_idx >= 8, 0)) return 0; + return (g_free_legacy_mask & (1u << class_idx)) ? 1 : 0; +} + +#endif // HAK_BOX_FREE_PATH_LEGACY_MASK_BOX_H diff --git a/core/front/malloc_tiny_fast.h b/core/front/malloc_tiny_fast.h index 4b66c325..d855b2ec 100644 --- a/core/front/malloc_tiny_fast.h +++ b/core/front/malloc_tiny_fast.h @@ -74,6 +74,8 @@ #include "../box/free_cold_shape_stats_box.h" // Phase 5 E5-3a: Free cold shape stats #include "../box/free_tiny_fast_mono_dualhot_env_box.h" // Phase 9: MONO DUALHOT ENV gate #include "../box/free_tiny_fast_mono_legacy_direct_env_box.h" // Phase 10: MONO LEGACY DIRECT ENV gate +#include "../box/free_path_commit_once_fixed_box.h" // Phase 85: Free path commit-once (LEGACY-only) +#include "../box/free_path_legacy_mask_box.h" // Phase 86: Free path legacy mask (mask-only, no indirect calls) #include "../box/alloc_passdown_ssot_env_box.h" // Phase 60: Alloc pass-down SSOT // Helper: current thread id (low 32 bits) for owner check @@ -955,6 +957,39 @@ static inline int free_tiny_fast(void* ptr) { // Phase 19-3b: Consolidate ENV snapshot reads (capture once per free_tiny_fast call). const HakmemEnvSnapshot* env = hakmem_env_snapshot_enabled() ? hakmem_env_snapshot() : NULL; + // Phase 86: Free path legacy mask - Direct early exit for LEGACY classes (no indirect calls) + // Conditions: + // - ENV: HAKMEM_FREE_PATH_LEGACY_MASK=1 + // - class_idx in legacy_mask (LEGACY route, not ULTRA/MID/V7) + // - LARSON_FIX=0 (checked at startup, fail-fast if enabled) + if (__builtin_expect(free_path_legacy_mask_enabled_fast(), 0)) { + if (__builtin_expect(free_path_legacy_mask_has_class((unsigned)class_idx), 0)) { + // Direct path: Call legacy handler without policy snapshot, route, or mono checks + tiny_legacy_fallback_free_base_with_env(base, (uint32_t)class_idx, env); + return 1; + } + } + + // Phase 85: Free path commit-once (LEGACY-only) - Skip policy/route/mono ceremony for committed C4-C7 + // Conditions: + // - ENV: HAKMEM_FREE_PATH_COMMIT_ONCE=1 + // - class_idx in C4-C7 (129-256B LEGACY classes) + // - Pre-computed at startup that class can use commit-once + // - LARSON_FIX=0 (checked at startup, fail-fast if enabled) + if (__builtin_expect(free_path_commit_once_enabled_fast(), 0)) { + if (__builtin_expect((unsigned)class_idx >= 4u && (unsigned)class_idx <= 7u, 0)) { + const unsigned cache_idx = (unsigned)class_idx - 4u; + const struct FreePatchCommitOnceEntry* entry = &g_free_path_commit_once_entries[cache_idx]; + + if (__builtin_expect(entry->can_commit, 0)) { + // Direct path: Call handler without policy snapshot, route, or mono checks + FREE_PATH_STAT_INC(commit_once_hit); + entry->handler(base, (uint32_t)class_idx, env); + return 1; + } + } + } + // Phase 9: MONO DUALHOT early-exit for C0-C3 (skip policy snapshot, direct to legacy) // Conditions: // - ENV: HAKMEM_FREE_TINY_FAST_MONO_DUALHOT=1 diff --git a/docs/analysis/BENCH_REPRODUCIBILITY_SSOT.md b/docs/analysis/BENCH_REPRODUCIBILITY_SSOT.md index 6e6af78a..523252fc 100644 --- a/docs/analysis/BENCH_REPRODUCIBILITY_SSOT.md +++ b/docs/analysis/BENCH_REPRODUCIBILITY_SSOT.md @@ -46,3 +46,10 @@ allocatorๆฏ”่ผƒใฏ layout tax ใŒๆททใ–ใ‚‹ใŸใ‚ **reference**ใ€‚ ใฎใพใพใ ใจใ€ใƒ™ใƒณใƒใŒโ€œ้…ใ„ๅดโ€ใซๅฏ„ใ‚‹ใ“ใจใŒใ‚ใ‚‹ใ€‚ ใพใšใฏ `HAKMEM_BENCH_ENV_LOG=1` ใฎๅ‡บๅŠ›ใŒ **ๅŒใ˜**ๆกไปถๅŒๅฃซใงๆฏ”่ผƒใ™ใ‚‹ใ“ใจใ€‚ + +## 6) ๅค–้ƒจใƒฌใƒ“ใƒฅใƒผ๏ผˆ่ฒผใ‚Šไป˜ใ‘ใƒ‘ใ‚ฑใƒƒใƒˆ๏ผ‰ + +ใ€Œใ‚ณใƒผใƒ‰ใ‚’ๅœง็ธฎใ—ใฆ่ฒผใ‚‹ใ€็”จ้€”ใฏใ€ๆฏŽๅ›žใฎๆ‰‹ไฝœๆฅญใ‚’ๆธ›ใ‚‰ใ™ใŸใ‚ใซใƒ‘ใ‚ฑใƒƒใƒˆ็”Ÿๆˆใ‚’ไฝฟใ†: + +- ็”Ÿๆˆใ‚นใ‚ฏใƒชใƒ—ใƒˆ: `scripts/make_chatgpt_pro_packet_free_path.sh` +- ็”Ÿๆˆ็‰ฉ๏ผˆใ‚นใƒŠใƒƒใƒ—ใ‚ทใƒงใƒƒใƒˆ๏ผ‰: `docs/analysis/FREE_PATH_REVIEW_PACKET_CHATGPT.md` diff --git a/docs/analysis/FREE_PATH_REVIEW_PACKET_CHATGPT.md b/docs/analysis/FREE_PATH_REVIEW_PACKET_CHATGPT.md new file mode 100644 index 00000000..abf2a597 --- /dev/null +++ b/docs/analysis/FREE_PATH_REVIEW_PACKET_CHATGPT.md @@ -0,0 +1,555 @@ + + +# Hakmem free-path review packet (compact) + +Goal: understand remaining fixed costs vs mimalloc/tcmalloc, with Box Theory (single boundary, reversible ENV gates). + +SSOT bench conditions (current practice): +- `HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE` +- `ITERS=20000000 WS=400 RUNS=10` +- run via `scripts/run_mixed_10_cleanenv.sh` + +Request: +1) Where is the dominant fixed cost on free path now? +2) What structural change would give +5โ€“10% without breaking Box Theory? +3) What NOT to do (layout tax pitfalls)? + +## Code excerpts (clipped) + +### `core/box/tiny_free_gate_box.h` +```c +static inline int tiny_free_gate_try_fast(void* user_ptr) +{ +#if !HAKMEM_TINY_HEADER_CLASSIDX + (void)user_ptr; + // Header ็„กๅŠนๆง‹ๆˆใงใฏ Tiny Fast Path ่‡ชไฝ“ใ‚’ไฝฟใ‚ใชใ„ + return 0; +#else + if (__builtin_expect(!user_ptr, 0)) { + return 0; + } + + // Layer 3a: ่ปฝ้‡ Fail-Fast๏ผˆๅธธๆ™‚ON๏ผ‰ + // ๆ˜Žใ‚‰ใ‹ใซไธๆญฃใชใ‚ขใƒ‰ใƒฌใ‚น๏ผˆๆฅต็ซฏใซๅฐใ•ใ„ๅ€ค๏ผ‰ใฏ Fast Path ใงใฏๆ‰ฑใ‚ใชใ„ใ€‚ + // Slow Path ๅด๏ผˆhak_free_at + registry/header๏ผ‰ใซไปปใ›ใ‚‹ใ€‚ + { + uintptr_t addr = (uintptr_t)user_ptr; + if (__builtin_expect(addr < 4096, 0)) { +#if !HAKMEM_BUILD_RELEASE + static _Atomic uint32_t g_free_gate_range_invalid = 0; + uint32_t n = atomic_fetch_add_explicit(&g_free_gate_range_invalid, 1, memory_order_relaxed); + if (n < 8) { + fprintf(stderr, + "[TINY_FREE_GATE_RANGE_INVALID] ptr=%p\n", + user_ptr); + fflush(stderr); + } +#endif + return 0; + } + } + + // ๅฐ†ๆฅใฎๆ‹กๅผตใƒใ‚คใƒณใƒˆ: + // - DIAG ON ใฎใจใใ ใ‘ Bridge + Guard ใ‚’ๅฎŸ่กŒใ—ใ€ + // Tiny ็ฎก็†ๅค–ใจๅˆคๅฎšใ•ใ‚ŒใŸๅ ดๅˆใฏ Fast Path ใ‚’ใ‚นใ‚ญใƒƒใƒ—ใ™ใ‚‹ใ€‚ +#if !HAKMEM_BUILD_RELEASE + if (__builtin_expect(tiny_free_gate_diag_enabled(), 0)) { + TinyFreeGateContext ctx; + if (!tiny_free_gate_classify(user_ptr, &ctx)) { + // Tiny ็ฎก็†ๅค– or Bridge ๅคฑๆ•— โ†’ Fast Path ใฏไฝฟใ‚ใชใ„ + return 0; + } + (void)ctx; // ็พๆ™‚็‚นใงใฏใƒญใ‚ฐๅฐ‚็”จใ€‚ๅฐ†ๆฅใฏใ“ใ“ใ‹ใ‚‰ Guard ใ‚’ๆŒฟๅ…ฅใ€‚ + } +#endif + + // ๆœฌไฝ“ใฏๆ—ขๅญ˜ใฎ ultra-fast free ใซไธธๆŠ•ใ’๏ผˆๆŒ™ๅ‹•ใ‚’ๅค‰ใˆใชใ„๏ผ‰ + return hak_tiny_free_fast_v2(user_ptr); +#endif +} +``` + +### `core/front/malloc_tiny_fast.h` +```c +static inline int free_tiny_fast(void* ptr) { + if (__builtin_expect(!ptr, 0)) return 0; + +#if HAKMEM_TINY_HEADER_CLASSIDX + // 1. ใƒšใƒผใ‚ธๅขƒ็•Œใ‚ฌใƒผใƒ‰: + // ptr ใŒใƒšใƒผใ‚ธๅ…ˆ้ ญ (offset==0) ใฎๅ ดๅˆใ€ptr-1 ใฏๅˆฅใƒšใƒผใ‚ธใ‹ๆœชใƒžใƒƒใƒ—้ ˜ๅŸŸใซใชใ‚‹ๅฏ่ƒฝๆ€งใŒใ‚ใ‚‹ใ€‚ + // ใใฎๅ ดๅˆใฏใƒ˜ใƒƒใƒ€่ชญใฟใ‚’่กŒใ‚ใšใ€้€šๅธธ free ็ตŒ่ทฏใซใƒ•ใ‚ฉใƒผใƒซใƒใƒƒใ‚ฏใ™ใ‚‹ใ€‚ + uintptr_t off = (uintptr_t)ptr & 0xFFFu; + if (__builtin_expect(off == 0, 0)) { + return 0; + } + + // 2. Fast header magic validation (ๅฟ…้ ˆ) + // Release ใƒ“ใƒซใƒ‰ใงใฏ tiny_region_id_read_header() ใŒ magic ใ‚’็œ็•ฅใ™ใ‚‹ใŸใ‚ใ€ + // ใ“ใ“ใง่‡ชๅ‰ใซ Tiny ๅฐ‚็”จใƒ˜ใƒƒใƒ€ (0xA0) ใ‚’ๆคœ่จผใ—ใฆใŠใใ€‚ + uint8_t* header_ptr = (uint8_t*)ptr - 1; + uint8_t header = *header_ptr; + uint8_t magic = header & 0xF0u; + if (__builtin_expect(magic != HEADER_MAGIC, 0)) { + // Tiny ใƒ˜ใƒƒใƒ€ใงใฏใชใ„ โ†’ Mid/Large/ๅค–้ƒจใƒใ‚คใƒณใ‚ฟใชใฎใง้€šๅธธ free ็ตŒ่ทฏใธ + return 0; + } + + // 3. class_idx ๆŠฝๅ‡บ๏ผˆไธ‹ไฝ4bit๏ผ‰ + int class_idx = (int)(header & HEADER_CLASS_MASK); + if (__builtin_expect(class_idx < 0 || class_idx >= TINY_NUM_CLASSES, 0)) { + return 0; + } + + // 4. BASE ใ‚’่จˆ็ฎ—ใ—ใฆ Unified Cache ใซ push + void* base = tiny_user_to_base_inline(ptr); + tiny_front_free_stat_inc(class_idx); + + // Phase FREE-LEGACY-BREAKDOWN-1: ใ‚ซใ‚ฆใƒณใ‚ฟๆ•ฃๅธƒ (1. ้–ขๆ•ฐๅ…ฅๅฃ) + FREE_PATH_STAT_INC(total_calls); + + // Phase 19-3b: Consolidate ENV snapshot reads (capture once per free_tiny_fast call). + const HakmemEnvSnapshot* env = hakmem_env_snapshot_enabled() ? hakmem_env_snapshot() : NULL; + + // Phase 9: MONO DUALHOT early-exit for C0-C3 (skip policy snapshot, direct to legacy) + // Conditions: + // - ENV: HAKMEM_FREE_TINY_FAST_MONO_DUALHOT=1 + // - class_idx <= 3 (C0-C3) + // - !HAKMEM_TINY_LARSON_FIX (cross-thread handling requires full validation) + // - g_tiny_route_snapshot_done == 1 && route == TINY_ROUTE_LEGACY (ๆ–ญๅฎšใงใใชใ„ใจใใฏๆ—ขๅญ˜็ตŒ่ทฏ) + if ((unsigned)class_idx <= 3u) { + if (free_tiny_fast_mono_dualhot_enabled()) { + static __thread int g_larson_fix = -1; + if (__builtin_expect(g_larson_fix == -1, 0)) { + const char* e = getenv("HAKMEM_TINY_LARSON_FIX"); + g_larson_fix = (e && *e && *e != '0') ? 1 : 0; + } + + if (!g_larson_fix && + g_tiny_route_snapshot_done == 1 && + g_tiny_route_class[class_idx] == TINY_ROUTE_LEGACY) { + // Direct path: Skip policy snapshot, go straight to legacy fallback + FREE_PATH_STAT_INC(mono_dualhot_hit); + tiny_legacy_fallback_free_base_with_env(base, (uint32_t)class_idx, env); + return 1; + } + } + } + + // Phase 10: MONO LEGACY DIRECT early-exit for C4-C7 (skip policy snapshot, direct to legacy) + // Conditions: + // - ENV: HAKMEM_FREE_TINY_FAST_MONO_LEGACY_DIRECT=1 + // - cached nonlegacy_mask: class is NOT in non-legacy mask (= ULTRA/MID/V7 not active) + // - g_tiny_route_snapshot_done == 1 && route == TINY_ROUTE_LEGACY (ๆ–ญๅฎšใงใใชใ„ใจใใฏๆ—ขๅญ˜็ตŒ่ทฏ) + // - !HAKMEM_TINY_LARSON_FIX (cross-thread handling requires full validation) + if (free_tiny_fast_mono_legacy_direct_enabled()) { + // 1. Check nonlegacy mask (computed once at init) + uint8_t nonlegacy_mask = free_tiny_fast_mono_legacy_direct_nonlegacy_mask(); + if ((nonlegacy_mask & (1u << class_idx)) == 0) { + // 2. Check route snapshot + if (g_tiny_route_snapshot_done == 1 && g_tiny_route_class[class_idx] == TINY_ROUTE_LEGACY) { + // 3. Check Larson fix + static __thread int g_larson_fix = -1; + if (__builtin_expect(g_larson_fix == -1, 0)) { + const char* e = getenv("HAKMEM_TINY_LARSON_FIX"); + g_larson_fix = (e && *e && *e != '0') ? 1 : 0; + } + + if (!g_larson_fix) { + // Direct path: Skip policy snapshot, go straight to legacy fallback + FREE_PATH_STAT_INC(mono_legacy_direct_hit); + tiny_legacy_fallback_free_base_with_env(base, (uint32_t)class_idx, env); + return 1; + } + } + } + } + + // Phase v11b-1: C7 ULTRA early-exit (skip policy snapshot for most common case) + // Phase 4 E1: Use ENV snapshot when enabled (consolidates 3 TLS reads โ†’ 1) + // Phase 19-3a: Remove UNLIKELY hint (snapshot is ON by default in presets, hint is backwards) + const bool c7_ultra_free = env ? env->tiny_c7_ultra_enabled : tiny_c7_ultra_enabled_env(); + + if (class_idx == 7 && c7_ultra_free) { + tiny_c7_ultra_free(ptr); + return 1; + } + + // Phase POLICY-FAST-PATH-V2: Skip policy snapshot for known-legacy classes + if (free_policy_fast_v2_can_skip((uint8_t)class_idx)) { + FREE_PATH_STAT_INC(policy_fast_v2_skip); + goto legacy_fallback; + } + + // Phase v11b-1: Policy-based single switch (replaces serial ULTRA checks) + const SmallPolicyV7* policy_free = small_policy_v7_snapshot(); + SmallRouteKind route_kind_free = policy_free->route_kind[class_idx]; + + switch (route_kind_free) { + case SMALL_ROUTE_ULTRA: { + // Phase TLS-UNIFY-1: Unified ULTRA TLS push for C4-C6 (C7 handled above) + if (class_idx >= 4 && class_idx <= 6) { + tiny_ultra_tls_push((uint8_t)class_idx, base); + return 1; + } + // ULTRA for other classes โ†’ fallback to LEGACY + break; + } + + case SMALL_ROUTE_MID_V35: { + // Phase v11a-3: MID v3.5 free + small_mid_v35_free(ptr, class_idx); + FREE_PATH_STAT_INC(smallheap_v7_fast); + return 1; + } + + case SMALL_ROUTE_V7: { + // Phase v7: SmallObject v7 free (research box) + if (small_heap_free_fast_v7_stub(ptr, (uint8_t)class_idx)) { + FREE_PATH_STAT_INC(smallheap_v7_fast); + return 1; + } + // V7 miss โ†’ fallback to LEGACY + break; + } + + case SMALL_ROUTE_MID_V3: { + // Phase MID-V3: delegate to MID v3.5 + small_mid_v35_free(ptr, class_idx); + FREE_PATH_STAT_INC(smallheap_v7_fast); + return 1; + } + + case SMALL_ROUTE_LEGACY: + default: + break; + } + +legacy_fallback: + // LEGACY fallback path + // Phase 19-6C: Compute route once using helper (avoid redundant tiny_route_for_class) + tiny_route_kind_t route; + int use_tiny_heap; + free_tiny_fast_compute_route_and_heap(class_idx, &route, &use_tiny_heap); + + // TWO-SPEED: SuperSlab registration check is DEBUG-ONLY to keep HOT PATH fast. + // In Release builds, we trust header magic (0xA0) as sufficient validation. +#if !HAKMEM_BUILD_RELEASE + // 5. Superslab ็™ป้Œฒ็ขบ่ช๏ผˆ่ชคๅˆ†้กž้˜ฒๆญข๏ผ‰ + SuperSlab* ss_guard = hak_super_lookup(ptr); + if (__builtin_expect(!(ss_guard && ss_guard->magic == SUPERSLAB_MAGIC), 0)) { + return 0; // hakmem ็ฎก็†ๅค– โ†’ ้€šๅธธ free ็ตŒ่ทฏใธ + } +#endif // !HAKMEM_BUILD_RELEASE + + // Cross-thread free detection (Larson MT crash fix, ENV gated) + TinyHeap free path + { + static __thread int g_larson_fix = -1; + if (__builtin_expect(g_larson_fix == -1, 0)) { + const char* e = getenv("HAKMEM_TINY_LARSON_FIX"); + g_larson_fix = (e && *e && *e != '0') ? 1 : 0; +#if !HAKMEM_BUILD_RELEASE + fprintf(stderr, "[LARSON_FIX_INIT] g_larson_fix=%d (env=%s)\n", g_larson_fix, e ? e : "NULL"); + fflush(stderr); +#endif + } + + if (__builtin_expect(g_larson_fix || use_tiny_heap, 0)) { + // Phase 12 optimization: Use fast mask-based lookup (~5-10 cycles vs 50-100) + SuperSlab* ss = ss_fast_lookup(base); + // Phase FREE-LEGACY-BREAKDOWN-1: ใ‚ซใ‚ฆใƒณใ‚ฟๆ•ฃๅธƒ (5. super_lookup ๅ‘ผใณๅ‡บใ—) + FREE_PATH_STAT_INC(super_lookup_called); + if (ss) { + int slab_idx = slab_index_for(ss, base); + if (__builtin_expect(slab_idx >= 0 && slab_idx < ss_slabs_capacity(ss), 1)) { + uint32_t self_tid = tiny_self_u32_local(); + uint8_t owner_tid_low = ss_slab_meta_owner_tid_low_get(ss, slab_idx); + TinySlabMeta* meta = &ss->slabs[slab_idx]; + // LARSON FIX: Use bits 8-15 for comparison (pthread TIDs aligned to 256 bytes) + uint8_t self_tid_cmp = (uint8_t)((self_tid >> 8) & 0xFFu); +#if !HAKMEM_BUILD_RELEASE + static _Atomic uint64_t g_owner_check_count = 0; + uint64_t oc = atomic_fetch_add(&g_owner_check_count, 1); + if (oc < 10) { + fprintf(stderr, "[LARSON_FIX] Owner check: ptr=%p owner_tid_low=0x%02x self_tid_cmp=0x%02x self_tid=0x%08x match=%d\n", + ptr, owner_tid_low, self_tid_cmp, self_tid, (owner_tid_low == self_tid_cmp)); + fflush(stderr); + } +#endif + + if (__builtin_expect(owner_tid_low != self_tid_cmp, 0)) { + // Cross-thread free โ†’ route to remote queue instead of poisoning TLS cache +#if !HAKMEM_BUILD_RELEASE + static _Atomic uint64_t g_cross_thread_count = 0; + uint64_t ct = atomic_fetch_add(&g_cross_thread_count, 1); + if (ct < 20) { + fprintf(stderr, "[LARSON_FIX] Cross-thread free detected! ptr=%p owner_tid_low=0x%02x self_tid_cmp=0x%02x self_tid=0x%08x\n", + ptr, owner_tid_low, self_tid_cmp, self_tid); + fflush(stderr); + } +#endif + if (tiny_free_remote_box(ss, slab_idx, meta, ptr, self_tid)) { + // Phase FREE-LEGACY-BREAKDOWN-1: ใ‚ซใ‚ฆใƒณใ‚ฟๆ•ฃๅธƒ (6. cross-thread free) + FREE_PATH_STAT_INC(remote_free); + return 1; // handled via remote queue +``` + +### `core/box/tiny_front_hot_box.h` +```c +static inline int tiny_hot_free_fast(int class_idx, void* base) { + extern __thread TinyUnifiedCache g_unified_cache[]; + + // TLS cache access (1 cache miss) + // NOTE: Range check removed - caller guarantees valid class_idx + TinyUnifiedCache* cache = &g_unified_cache[class_idx]; + +#if HAKMEM_TINY_UNIFIED_LIFO_COMPILED + // Phase 15 v1: Mode check at entry (once per call, not scattered in hot path) + // Phase 22: Compile-out when disabled (default OFF) + int lifo_mode = tiny_unified_lifo_enabled(); + + // Phase 15 v1: LIFO vs FIFO mode switch + if (lifo_mode) { + // === LIFO MODE: Stack-based (LIFO) === + // Try push to stack (tail is stack depth) + if (unified_cache_try_push_lifo(class_idx, base)) { + #if !HAKMEM_BUILD_RELEASE + extern __thread uint64_t g_unified_cache_push[]; + g_unified_cache_push[class_idx]++; + #endif + return 1; // SUCCESS + } + // LIFO overflow โ†’ fall through to cold path + #if !HAKMEM_BUILD_RELEASE + extern __thread uint64_t g_unified_cache_full[]; + g_unified_cache_full[class_idx]++; + #endif + return 0; // FULL + } +#endif + + // === FIFO MODE: Ring-based (existing, default) === + // Calculate next tail (for full check) + uint16_t next_tail = (cache->tail + 1) & cache->mask; + + // Branch 1: Cache full check (UNLIKELY full) + // Hot path: cache has space (next_tail != head) + // Cold path: cache full (next_tail == head) โ†’ drain needed + if (TINY_HOT_LIKELY(next_tail != cache->head)) { + // === HOT PATH: Cache has space (2-3 instructions) === + + // Push to cache (1 cache miss for array write) + cache->slots[cache->tail] = base; + cache->tail = next_tail; + + // Debug metrics (zero overhead in release) + #if !HAKMEM_BUILD_RELEASE + extern __thread uint64_t g_unified_cache_push[]; + g_unified_cache_push[class_idx]++; + #endif + + return 1; // SUCCESS + } + + // === COLD PATH: Cache full === + // Don't drain here - let caller handle via tiny_cold_drain_and_free() + #if !HAKMEM_BUILD_RELEASE + extern __thread uint64_t g_unified_cache_full[]; + g_unified_cache_full[class_idx]++; + #endif + + return 0; // FULL +} +``` + +### `core/box/tiny_legacy_fallback_box.h` +```c +static inline void tiny_legacy_fallback_free_base_with_env(void* base, uint32_t class_idx, const HakmemEnvSnapshot* env) { + // Phase 80-1: Switch dispatch for C4/C5/C6 (branch reduction optimization) + // Phase 83-1: Per-op branch removed via fixed-mode caching + // C2/C3 excluded (NO-GO from Phase 77-1/79-1) + if (tiny_inline_slots_switch_dispatch_enabled_fast()) { + // Switch mode: Direct jump to case (zero comparison overhead for C4/C5/C6) + switch (class_idx) { + case 4: + if (tiny_c4_inline_slots_enabled_fast()) { + if (c4_inline_push(c4_inline_tls(), base)) { + FREE_PATH_STAT_INC(legacy_fallback); + if (__builtin_expect(free_path_stats_enabled(), 0)) { + g_free_path_stats.legacy_by_class[class_idx]++; + } + return; + } + } + break; + case 5: + if (tiny_c5_inline_slots_enabled_fast()) { + if (c5_inline_push(c5_inline_tls(), base)) { + FREE_PATH_STAT_INC(legacy_fallback); + if (__builtin_expect(free_path_stats_enabled(), 0)) { + g_free_path_stats.legacy_by_class[class_idx]++; + } + return; + } + } + break; + case 6: + if (tiny_c6_inline_slots_enabled_fast()) { + if (c6_inline_push(c6_inline_tls(), base)) { + FREE_PATH_STAT_INC(legacy_fallback); + if (__builtin_expect(free_path_stats_enabled(), 0)) { + g_free_path_stats.legacy_by_class[class_idx]++; + } + return; + } + } + break; + default: + // C0-C3, C7: fall through to unified_cache push + break; + } + // Switch mode: fall through to unified_cache push after miss + } else { + // If-chain mode (Phase 80-1 baseline): C3/C4/C5/C6 sequential checks + // NOTE: C2 local cache (Phase 79-1 NO-GO) removed from hot path + + // Phase 77-1: C3 Inline Slots early-exit (ENV gated) + // Try C3 inline slots SECOND (before C4/C5/C6/unified cache) for class 3 + if (class_idx == 3 && tiny_c3_inline_slots_enabled_fast()) { + if (c3_inline_push(c3_inline_tls(), base)) { + // Success: pushed to C3 inline slots + FREE_PATH_STAT_INC(legacy_fallback); + if (__builtin_expect(free_path_stats_enabled(), 0)) { + g_free_path_stats.legacy_by_class[class_idx]++; + } + return; + } + // FULL โ†’ fall through to C4/C5/C6/unified cache + } + + // Phase 76-1: C4 Inline Slots early-exit (ENV gated) + // Try C4 inline slots SECOND (before C5/C6/unified cache) for class 4 + if (class_idx == 4 && tiny_c4_inline_slots_enabled_fast()) { + if (c4_inline_push(c4_inline_tls(), base)) { + // Success: pushed to C4 inline slots + FREE_PATH_STAT_INC(legacy_fallback); + if (__builtin_expect(free_path_stats_enabled(), 0)) { + g_free_path_stats.legacy_by_class[class_idx]++; + } + return; + } + // FULL โ†’ fall through to C5/C6/unified cache + } + + // Phase 75-2: C5 Inline Slots early-exit (ENV gated) + // Try C5 inline slots SECOND (before C6 and unified cache) for class 5 + if (class_idx == 5 && tiny_c5_inline_slots_enabled_fast()) { + if (c5_inline_push(c5_inline_tls(), base)) { + // Success: pushed to C5 inline slots + FREE_PATH_STAT_INC(legacy_fallback); + if (__builtin_expect(free_path_stats_enabled(), 0)) { + g_free_path_stats.legacy_by_class[class_idx]++; + } + return; + } + // FULL โ†’ fall through to C6/unified cache + } + + // Phase 75-1: C6 Inline Slots early-exit (ENV gated) + // Try C6 inline slots THIRD (before unified cache) for class 6 + if (class_idx == 6 && tiny_c6_inline_slots_enabled_fast()) { + if (c6_inline_push(c6_inline_tls(), base)) { + // Success: pushed to C6 inline slots + FREE_PATH_STAT_INC(legacy_fallback); + if (__builtin_expect(free_path_stats_enabled(), 0)) { + g_free_path_stats.legacy_by_class[class_idx]++; + } + return; + } + // FULL โ†’ fall through to unified cache + } + } // End of if-chain mode + + const TinyFrontV3Snapshot* front_snap = + env ? (env->tiny_front_v3_enabled ? tiny_front_v3_snapshot_get() : NULL) + : (__builtin_expect(tiny_front_v3_enabled(), 0) ? tiny_front_v3_snapshot_get() : NULL); + const bool metadata_cache_on = env ? env->tiny_metadata_cache_eff : tiny_metadata_cache_enabled(); + + // Phase 3 C2 Patch 2: First page cache hint (optional fast-path) + // Check if pointer is in cached page (avoids metadata lookup in future optimizations) + if (__builtin_expect(metadata_cache_on, 0)) { + // Note: This is a hint-only check. Even if it hits, we still use the standard path. + // The cache will be populated during refill operations for future use. + // Currently this just validates the cache state; actual optimization TBD. + if (tiny_first_page_cache_hit(class_idx, base, 4096)) { + // Future: could optimize metadata access here + } + } + + // Legacy fallback - Unified Cache push + if (!front_snap || front_snap->unified_cache_on) { + // Phase 74-3 (P0): FASTAPI path (ENV-gated) + if (tiny_uc_fastapi_enabled()) { + // Preconditions guaranteed: + // - unified_cache_on == true (checked above) + // - TLS init guaranteed by front_gate_unified_enabled() in malloc_tiny_fast.h + // - Stats compiled-out in FAST builds + if (unified_cache_push_fast(class_idx, HAK_BASE_FROM_RAW(base))) { + FREE_PATH_STAT_INC(legacy_fallback); + + // Per-class breakdown (Phase 4-1) + if (__builtin_expect(free_path_stats_enabled(), 0)) { + if (class_idx < 8) { + g_free_path_stats.legacy_by_class[class_idx]++; + } + } + return; + } + // FULL โ†’ fallback to slow path (rare) + } + + // Original path (FASTAPI=0 or fallback) + if (unified_cache_push(class_idx, HAK_BASE_FROM_RAW(base))) { + FREE_PATH_STAT_INC(legacy_fallback); + + // Per-class breakdown (Phase 4-1) + if (__builtin_expect(free_path_stats_enabled(), 0)) { + if (class_idx < 8) { + g_free_path_stats.legacy_by_class[class_idx]++; + } + } + return; + } + } + + // Final fallback + tiny_hot_free_fast(class_idx, base); +} +``` + +## Questions to answer (please be concrete) + +1) In these snippets, which checks/branches are still "per-op fixed taxes" on the hot free path? + - Please point to specific lines/conditions and estimate cost (branches/instructions or dependency chain). + +2) Is `tiny_hot_free_fast()` already close to optimal, and the real bottleneck is upstream (user->base/classify/route)? + - If yes, whatโ€™s the smallest structural refactor that removes that upstream fixed tax? + +3) Should we introduce a "commit once" plan (freeze the chosen free path) โ€” or is branch prediction already making lazy-init checks ~free here? + - If "commit once", where should it live to avoid runtime gate overhead (bench_profile refresh boundary vs per-op)? + +4) We have had many layout-tax regressions from code removal/reordering. + - What patterns here are most likely to trigger layout tax if changed? + - How would you stage a safe A/B (same binary, ENV toggle) for your proposal? + +5) If you could change just ONE of: + - pointer classification to base/class_idx, + - route determination, + - unified cache push/pop structure, + which is highest ROI for +5โ€“10% on WS=400? + + +[packet] done diff --git a/docs/analysis/PHASE85_FREE_PATH_COMMIT_ONCE_PLAN.md b/docs/analysis/PHASE85_FREE_PATH_COMMIT_ONCE_PLAN.md new file mode 100644 index 00000000..9b42e9b9 --- /dev/null +++ b/docs/analysis/PHASE85_FREE_PATH_COMMIT_ONCE_PLAN.md @@ -0,0 +1,394 @@ +# Phase 85: Free Path Commit-Once (LEGACY-only) Implementation Plan + +## 1. Objective & Scope + +**Goal**: Eliminate per-operation policy/route/mono ceremony overhead in `free_tiny_fast()` for LEGACY route by applying Phase 78-1 "commit-once" pattern. + +**Target**: +2.0% improvement (GO threshold) + +**Scope**: +- LEGACY route only (classes C4-C7, size 129-256 bytes) +- Does NOT apply to ULTRA/MID/V7 routes +- Must coexist with existing Phase 9 (MONO DUALHOT) and Phase 10 (MONO LEGACY DIRECT) optimizations +- Fail-fast if HAKMEM_TINY_LARSON_FIX enabled (owner_tid validation incompatible with commit-once) + +**Strategy**: Cache Route + Handler mapping at init-time (bench_profile refresh boundary), skip 12-20 branches per free() in hot path. + +--- + +## 2. Architecture & Design + +### 2.1 Core Pattern (Phase 78-1 Adaptation) + +Following Phase 78-1 successful pattern: + +``` +โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” +โ”‚ Init-time (bench_profile refresh boundary) โ”‚ +โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚ +โ”‚ free_path_commit_once_refresh_from_env() โ”‚ +โ”‚ โ”œโ”€ Read ENV: HAKMEM_FREE_PATH_COMMIT_ONCE=0/1 โ”‚ +โ”‚ โ”œโ”€ Fail-fast: if LARSON_FIX enabled โ†’ disable โ”‚ +โ”‚ โ”œโ”€ For C4-C7 (LEGACY classes): โ”‚ +โ”‚ โ”‚ โ””โ”€ Compute: route_kind, handler function โ”‚ +โ”‚ โ”‚ โ””โ”€ Store: g_free_path_commit_once_fixed[4] โ”‚ +โ”‚ โ””โ”€ Set: g_free_path_commit_once_enabled = true โ”‚ +โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ + โ”‚ + โ–ผ +โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” +โ”‚ Hot path (every free) โ”‚ +โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚ +โ”‚ free_tiny_fast() โ”‚ +โ”‚ if (g_free_path_commit_once_enabled_fast()) { โ”‚ +โ”‚ // NEW: Direct dispatch, skip all ceremony โ”‚ +โ”‚ auto& cached = g_free_path_commit_once_fixed[ โ”‚ +โ”‚ class_idx - TINY_C4]; โ”‚ +โ”‚ return cached.handler(ptr, class_idx, heap); โ”‚ +โ”‚ } โ”‚ +โ”‚ // Fallback: existing Phase 9/10/policy/route โ”‚ +โ”‚ ... โ”‚ +โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ +``` + +### 2.2 Cached State Structure + +```c +typedef void (*FreeTinyHandler)(void* ptr, unsigned class_idx, TinyHeap* heap); + +struct FreePatchCommitOnceEntry { + TinyRouteKind route_kind; // LEGACY, ULTRA, MID, V7 (validation only) + FreeTinyHandler handler; // Direct function pointer + uint8_t valid; // Safety flag +}; + +// Global state (4 entries for C4-C7) +extern FreePatchCommitOnceEntry g_free_path_commit_once_fixed[4]; +extern bool g_free_path_commit_once_enabled; +``` + +### 2.3 What Gets Cached + +For each LEGACY class (C4-C7): +- **route_kind**: Expected to be `TINY_ROUTE_LEGACY` +- **handler**: Function pointer to `tiny_legacy_fallback_free_base_with_env` or appropriate handler +- **valid**: Safety flag (1 if cache entry is valid) + +### 2.4 Eliminated Overhead + +**Before** (15-26 branches per free): +1. Phase 9 MONO DUALHOT check (3-5 branches) +2. Phase 10 MONO LEGACY DIRECT check (4-6 branches) +3. Policy snapshot call `small_policy_v7_snapshot()` (5-10 branches, potential getenv) +4. Route computation `tiny_route_for_class()` (3-5 branches) +5. Switch on route_kind (1-2 branches) + +**After** (commit-once enabled, LEGACY classes): +1. Master gate check `g_free_path_commit_once_enabled_fast()` (1 branch, predicted taken) +2. Class index range check (1 branch, predicted taken) +3. Cached entry lookup (0 branches, direct memory load) +4. Direct handler dispatch (1 indirect call) + +**Branch reduction**: 12-20 branches per LEGACY free โ†’ **Estimated +2-3% improvement** + +--- + +## 3. Files to Create/Modify + +### 3.1 New Files (Box Pattern) + +#### `core/box/free_path_commit_once_fixed_box.h` +```c +#ifndef HAKMEM_FREE_PATH_COMMIT_ONCE_FIXED_BOX_H +#define HAKMEM_FREE_PATH_COMMIT_ONCE_FIXED_BOX_H + +#include +#include +#include "core/hakmem_tiny_defs.h" + +typedef void (*FreeTinyHandler)(void* ptr, unsigned class_idx, TinyHeap* heap); + +struct FreePatchCommitOnceEntry { + TinyRouteKind route_kind; + FreeTinyHandler handler; + uint8_t valid; +}; + +// Global cache (4 entries for C4-C7) +extern struct FreePatchCommitOnceEntry g_free_path_commit_once_fixed[4]; +extern bool g_free_path_commit_once_enabled; + +// Fast-path API (inlined, no fallback needed) +static inline bool free_path_commit_once_enabled_fast(void) { + return __builtin_expect(g_free_path_commit_once_enabled, 0); +} + +// Refresh (called once at bench_profile boundary) +void free_path_commit_once_refresh_from_env(void); + +#endif +``` + +#### `core/box/free_path_commit_once_fixed_box.c` +```c +#include "free_path_commit_once_fixed_box.h" +#include "core/box/tiny_env_box.h" +#include "core/box/tiny_larson_fix_env_box.h" +#include "core/hakmem_tiny.h" +#include +#include + +struct FreePatchCommitOnceEntry g_free_path_commit_once_fixed[4]; +bool g_free_path_commit_once_enabled = false; + +void free_path_commit_once_refresh_from_env(void) { + // Read master ENV gate + const char* env_val = getenv("HAKMEM_FREE_PATH_COMMIT_ONCE"); + bool requested = (env_val && atoi(env_val) == 1); + + if (!requested) { + g_free_path_commit_once_enabled = false; + return; + } + + // Fail-fast: LARSON_FIX incompatible with commit-once + if (tiny_larson_fix_enabled()) { + fprintf(stderr, "[FREE_COMMIT_ONCE] FAIL-FAST: HAKMEM_TINY_LARSON_FIX=1 incompatible, disabling\n"); + g_free_path_commit_once_enabled = false; + return; + } + + // Pre-compute route + handler for C4-C7 (LEGACY) + for (unsigned i = 0; i < 4; i++) { + unsigned class_idx = TINY_C4 + i; + + // Route determination (expect LEGACY for C4-C7) + TinyRouteKind route = tiny_route_for_class(class_idx); + + // Handler selection (simplified, matches free_tiny_fast logic) + FreeTinyHandler handler = NULL; + + if (route == TINY_ROUTE_LEGACY) { + handler = tiny_legacy_fallback_free_base_with_env; + } else { + // Unexpected route, fail-fast + fprintf(stderr, "[FREE_COMMIT_ONCE] FAIL-FAST: C%u route=%d not LEGACY, disabling\n", + class_idx, (int)route); + g_free_path_commit_once_enabled = false; + return; + } + + g_free_path_commit_once_fixed[i].route_kind = route; + g_free_path_commit_once_fixed[i].handler = handler; + g_free_path_commit_once_fixed[i].valid = 1; + } + + g_free_path_commit_once_enabled = true; +} +``` + +### 3.2 Modified Files + +#### `core/front/malloc_tiny_fast.h` (free_tiny_fast function) + +**Insertion point**: Line ~950, before Phase 9/10 checks + +```c +static void free_tiny_fast(void* ptr, unsigned class_idx, TinyHeap* heap, ...) { + // NEW: Phase 85 commit-once fast path (LEGACY classes only) + #if HAKMEM_BOX_FREE_PATH_COMMIT_ONCE_FIXED + if (free_path_commit_once_enabled_fast()) { + if (class_idx >= TINY_C4 && class_idx <= TINY_C7) { + const unsigned cache_idx = class_idx - TINY_C4; + const struct FreePatchCommitOnceEntry* entry = + &g_free_path_commit_once_fixed[cache_idx]; + + if (__builtin_expect(entry->valid, 1)) { + entry->handler(ptr, class_idx, heap); + return; + } + } + } + #endif + + // Existing Phase 9/10/policy/route ceremony (fallback) + ... +} +``` + +#### `core/bench_profile.h` (refresh function integration) + +Add to `refresh_all_env_caches()`: + +```c +void refresh_all_env_caches(void) { + // ... existing refreshes ... + + #if HAKMEM_BOX_FREE_PATH_COMMIT_ONCE_FIXED + free_path_commit_once_refresh_from_env(); + #endif +} +``` + +#### `Makefile` (box flag) + +Add new box flag: + +```makefile +BOX_FREE_PATH_COMMIT_ONCE_FIXED ?= 1 +CFLAGS += -DHAKMEM_BOX_FREE_PATH_COMMIT_ONCE_FIXED=$(BOX_FREE_PATH_COMMIT_ONCE_FIXED) +``` + +--- + +## 4. Implementation Stages + +### Stage 1: Box Infrastructure (1-2 hours) +1. Create `free_path_commit_once_fixed_box.h` with struct definition, global declarations, fast-path API +2. Create `free_path_commit_once_fixed_box.c` with refresh implementation +3. Add Makefile box flag +4. Integrate refresh call into `core/bench_profile.h` +5. **Validation**: Compile, verify no build errors + +### Stage 2: Hot Path Integration (1 hour) +1. Modify `core/front/malloc_tiny_fast.h` to add Phase 85 fast path at line ~950 +2. Add class range check (C4-C7) and cache lookup +3. Add handler dispatch with validity check +4. **Validation**: Compile, verify no build errors, run basic functionality test + +### Stage 3: Fail-Fast Safety (30 min) +1. Test LARSON_FIX=1 scenario, verify commit-once disabled +2. Test invalid route scenario (C4-C7 with non-LEGACY route) +3. **Validation**: Both scenarios should log fail-fast message and fall back to standard path + +### Stage 4: A/B Testing (2-3 hours) +1. Build single binary with box flag enabled +2. Baseline test: `HAKMEM_FREE_PATH_COMMIT_ONCE=0 RUNS=10 scripts/run_mixed_10_cleanenv.sh` +3. Treatment test: `HAKMEM_FREE_PATH_COMMIT_ONCE=1 RUNS=10 scripts/run_mixed_10_cleanenv.sh` +4. Compare mean/median/CV, calculate delta +5. **GO criteria**: +2.0% or better + +--- + +## 5. Test Plan + +### 5.1 SSOT Baseline (10-run) + +```bash +# Control (commit-once disabled) +HAKMEM_FREE_PATH_COMMIT_ONCE=0 RUNS=10 scripts/run_mixed_10_cleanenv.sh > /tmp/phase85_control.txt + +# Treatment (commit-once enabled) +HAKMEM_FREE_PATH_COMMIT_ONCE=1 RUNS=10 scripts/run_mixed_10_cleanenv.sh > /tmp/phase85_treatment.txt +``` + +**Expected baseline**: 55.53M ops/s (from recent allocator matrix) + +**GO threshold**: 55.53M ร— 1.02 = **56.64M ops/s** (treatment mean) + +### 5.2 Safety Tests + +```bash +# Test 1: LARSON_FIX incompatibility +HAKMEM_TINY_LARSON_FIX=1 HAKMEM_FREE_PATH_COMMIT_ONCE=1 ./bench_random_mixed_hakmem 1000000 400 1 +# Expected: Log "[FREE_COMMIT_ONCE] FAIL-FAST: HAKMEM_TINY_LARSON_FIX=1 incompatible" + +# Test 2: Invalid route scenario (manually inject via debugging) +# Expected: Log "[FREE_COMMIT_ONCE] FAIL-FAST: C4 route=X not LEGACY" +``` + +### 5.3 Performance Profile + +Optional (if time permits): + +```bash +# Perf stat comparison +HAKMEM_FREE_PATH_COMMIT_ONCE=0 perf stat -e branches,branch-misses ./bench_random_mixed_hakmem 20000000 400 1 +HAKMEM_FREE_PATH_COMMIT_ONCE=1 perf stat -e branches,branch-misses ./bench_random_mixed_hakmem 20000000 400 1 +``` + +**Expected**: 8-12% reduction in branches, <1% change in branch misses + +--- + +## 6. Rollback Strategy + +### Immediate Rollback (No Recompile) +```bash +export HAKMEM_FREE_PATH_COMMIT_ONCE=0 +``` + +### Box Removal (Recompile) +```bash +make clean +BOX_FREE_PATH_COMMIT_ONCE_FIXED=0 make bench_random_mixed_hakmem +``` + +### File Reversions +- Remove: `core/box/free_path_commit_once_fixed_box.{h,c}` +- Revert: `core/front/malloc_tiny_fast.h` (remove Phase 85 block) +- Revert: `core/bench_profile.h` (remove refresh call) +- Revert: `Makefile` (remove box flag) + +--- + +## 7. Expected Results + +### 7.1 Performance Target + +| Metric | Control | Treatment | Delta | Status | +|--------|---------|-----------|-------|--------| +| Mean (M ops/s) | 55.53 | 56.64+ | +2.0%+ | GO threshold | +| CV (%) | 1.5-2.0 | 1.5-2.0 | stable | required | +| Branch reduction | baseline | -8-12% | ~10% | expected | + +### 7.2 GO/NO-GO Decision + +**GO if**: +- Treatment mean โ‰ฅ 56.64M ops/s (+2.0%) +- CV remains stable (<3%) +- No regressions in other scenarios (json/mir/vm) +- Fail-fast tests pass + +**NO-GO if**: +- Treatment mean < 56.64M ops/s +- CV increases significantly (>3%) +- Regressions observed +- Fail-fast mechanisms fail + +### 7.3 Risk Assessment + +**Low Risk**: +- Scope limited to LEGACY route (C4-C7, 129-256 bytes) +- ENV gate allows instant rollback +- Fail-fast for LARSON_FIX ensures safety +- Phase 9/10 MONO optimizations unaffected (fall through on cache miss) + +**Potential Issues**: +- Layout tax: New code path may cause I-cache/register pressure (mitigated by early placement at line ~950) +- Indirect call overhead: Cached function pointer may have misprediction cost (likely negligible vs branch reduction) +- Route dynamics: If route changes at runtime (unlikely), commit-once becomes stale (requires bench_profile refresh) + +--- + +## 8. Success Criteria Summary + +1. โœ… Build completes without errors +2. โœ… Fail-fast tests pass (LARSON_FIX=1, invalid route) +3. โœ… SSOT 10-run treatment โ‰ฅ 56.64M ops/s (+2.0%) +4. โœ… CV remains stable (<3%) +5. โœ… No regressions in other scenarios + +**If all criteria met**: Merge to master, update CURRENT_TASK.md, record in PERFORMANCE_TARGETS_SCORECARD.md + +**If NO-GO**: Keep as research box, document findings, archive plan. + +--- + +## 9. References + +- Phase 78-1 pattern: `core/box/tiny_inline_slots_fixed_mode_box.{h,c}` +- Free path implementation: `core/front/malloc_tiny_fast.h:919-1221` +- LARSON_FIX constraint: `core/box/tiny_larson_fix_env_box.h` +- Route snapshot: `core/hakmem_tiny.c:64-65` (g_tiny_route_class, g_tiny_route_snapshot_done) +- SSOT validation: `scripts/run_mixed_10_cleanenv.sh` diff --git a/docs/analysis/PHASE85_FREE_PATH_COMMIT_ONCE_RESULTS.md b/docs/analysis/PHASE85_FREE_PATH_COMMIT_ONCE_RESULTS.md new file mode 100644 index 00000000..6b41cac9 --- /dev/null +++ b/docs/analysis/PHASE85_FREE_PATH_COMMIT_ONCE_RESULTS.md @@ -0,0 +1,68 @@ +# Phase 85: Free Path Commit-Once (LEGACY-only) โ€” Results + +## Goal + +`free_tiny_fast()` ใฎ free path ใงใ€**LEGACY ใซๆˆปใ‚‹ใพใงใฎใ€Œๅ„€ๅผใ€๏ผˆmono/policy/route ่จˆ็ฎ—๏ผ‰**ใ‚’ใ€ +bench_profile ๅขƒ็•Œใง commit-once ใ—ใฆ **hot path ใ‹ใ‚‰้™คๅŽป**ใ™ใ‚‹ใ€‚ + +- Scope: C4โ€“C7 ใฎ **LEGACY route ใฎใฟ** +- Reversible: `HAKMEM_FREE_PATH_COMMIT_ONCE=0/1` +- Safety: `HAKMEM_TINY_LARSON_FIX=1` ใชใ‚‰ fail-fast ใง commit ็„กๅŠน + +## Implementation + +- New box: + - `core/box/free_path_commit_once_fixed_box.h` + - `core/box/free_path_commit_once_fixed_box.c` +- Integration: + - `core/bench_profile.h` ใ‹ใ‚‰ `free_path_commit_once_refresh_from_env()` ใ‚’ๅ‘ผใถ + - `core/front/malloc_tiny_fast.h` ใฎ `free_tiny_fast()` ใง Phase 9/10 ใ‚ˆใ‚Šๅ‰ใซๆ—ฉๆœŸใƒใƒณใƒ‰ใƒฉ dispatch +- Build: + - `Makefile` ใซ `core/box/free_path_commit_once_fixed_box.o` ใ‚’่ฟฝๅŠ  + +## A/B Results (SSOT, 10-run) + +Control (`HAKMEM_FREE_PATH_COMMIT_ONCE=0`) +- Mean: 52.75M ops/s +- Median: 52.94M ops/s +- Min: 51.70M ops/s +- Max: 53.77M ops/s + +Treatment (`HAKMEM_FREE_PATH_COMMIT_ONCE=1`) +- Mean: 52.30M ops/s +- Median: 52.42M ops/s +- Min: 51.04M ops/s +- Max: 53.03M ops/s + +Delta: **-0.86% (NO-GO)** + +## Diagnosis + +### 1) Phase 10 (MONO LEGACY DIRECT) ใจๆœ€้ฉๅŒ–ๅ†…ๅฎนใŒ่ขซใ‚‹ + +ๆ—ขใซ `free_tiny_fast_mono_legacy_direct_enabled()` ใŒ **C4โ€“C7 ใฎ็›ด่กŒ**๏ผˆpolicy snapshot ใ‚’ใ‚นใ‚ญใƒƒใƒ—๏ผ‰ใ‚’ๆไพ›ใ—ใฆใ„ใ‚‹ใŸใ‚ใ€ +Phase 85 ใŒใ€Œ่ฟฝๅŠ ใงๆถˆใ›ใ‚‹ๅ„€ๅผใ€ใŒ่–„ใ‹ใฃใŸใ€‚ + +็ตๆžœใจใ—ใฆใ€Phase 85 ใฏ **่ฟฝๅŠ ใฎ gate/table ๅ‚็…ง**ใ‚’ๆŒใก่พผใฟใ€ใƒ—ใƒฉใ‚นใซใชใ‚Šใซใใ„ใ€‚ + +### 2) function pointer dispatch ใฎ็จŽ + +Phase 85 ใฏ `entry->handler(base, class_idx, env)` ใฎ **้–“ๆŽฅๅ‘ผใณๅ‡บใ—**ใ‚’ๅฐŽๅ…ฅใ—ใฆใ„ใ‚‹ใ€‚ +ใ“ใฎ็จฎใฎ้–“ๆŽฅๅˆ†ๅฒใฏ branch predictor / layout ใฎๅฝฑ้Ÿฟใ‚’ๅ—ใ‘ใ‚„ใ™ใใ€SSOTใงใฏ net ใง่ฒ ใ‘ใ‚‹ๅฏ่ƒฝๆ€งใŒใ‚ใ‚‹ใ€‚ + +### 3) layout tax ใฎๅฏ่ƒฝๆ€ง + +free hot path (`free_tiny_fast`) ใธๆ–ฐ่ฆใ‚ณใƒผใƒ‰ใ‚’ๆŒฟๅ…ฅใ—ใŸใ“ใจใง text layout ใŒๆบใ‚Œใ€ +-0.x% ใฎ็ฌฆๅทๅ่ปขใŒ่ตทใใ‚„ใ™ใ„๏ผˆๆ—ข็Ÿฅใƒ‘ใ‚ฟใƒผใƒณ๏ผ‰ใ€‚ + +## Decision + +- **NO-GO**: `HAKMEM_FREE_PATH_COMMIT_ONCE` ใฏ **default OFF ใฎ research box**ใจใ—ใฆไฟๆŒ +- ็‰ฉ็†ๅ‰Š้™คใฏใ—ใชใ„๏ผˆlayout tax ใฎ็ฌฆๅทๅ่ปขใ‚’้ฟใ‘ใ‚‹ใŸใ‚๏ผ‰ + +## Follow-ups (if revisiting) + +1. Handler cache ใ‚’ใ‚„ใ‚ใ€commit-once ใฏ **bitmask (legacy_mask) ใฎใฟ**ใซใ™ใ‚‹๏ผˆ้–“ๆŽฅ call ๆŽ’้™ค๏ผ‰ใ€‚ +2. `env snapshot` ใ‚’ hot path ใงๅ–ใ‚‹ๅ‰ใซ exit ใงใใ‚‹ๅฝขใ‚’็ถญๆŒใ—ใ€hot ๅดใฏ **1ๆœฌใฎๆ—ฉๆœŸreturn**ใซ็•™ใ‚ใ‚‹ใ€‚ +3. โ€œ็ฝฎๆ›โ€ใฏ Phase 9/10 ใ‚’ compile-out ใงใใ‚‹ๆกไปถใŒๆƒใฃใŸๅพŒใซ Phase 86 ใงๆคœ่จŽ๏ผˆๅŒไธ€ใƒใ‚คใƒŠใƒช A/B ใ‚’ๅ„ชๅ…ˆ๏ผ‰ใ€‚ + diff --git a/docs/analysis/RESEARCH_BOXES_SSOT.md b/docs/analysis/RESEARCH_BOXES_SSOT.md index 96bab12c..c9388c5e 100644 --- a/docs/analysis/RESEARCH_BOXES_SSOT.md +++ b/docs/analysis/RESEARCH_BOXES_SSOT.md @@ -39,3 +39,11 @@ - ็ ”็ฉถ็ฎฑใ‚’โ€œๅ‰Š้™คโ€ใ™ใ‚‹ใฎใฏใ€ๆฌกใฎๆกไปถใ‚’ๆบ€ใŸใ—ใŸใจใใ ใ‘: - (1) ๅฐ‘ใชใใจใ‚‚ 2้€ฑ้–“ไปฅไธŠไฝฟใฃใฆใ„ใชใ„ใ€(2) SSOT/bench_profile/cleanenv ใŒๅ‚็…งใ—ใฆใ„ใชใ„ใ€ (3) ๅŒไธ€ใƒใ‚คใƒŠใƒช A/B ใงๅ‰Š้™คใ—ใฆใ‚‚ๆ€ง่ƒฝใŒๅค‰ใ‚ใ‚‰ใชใ„๏ผˆlayout tax ็„กใ„๏ผ‰ใ“ใจใ‚’็ขบ่ชใ—ใŸใ€‚ + +## ๅค–้ƒจ็›ธ่ซ‡ใฎSSOT๏ผˆ่ฒผใ‚Šไป˜ใ‘ใƒ‘ใ‚ฑใƒƒใƒˆ๏ผ‰ + +ๅ‡็ต็ฎฑใŒๅข—ใˆใฆใใ‚‹ใจใ€Œใฉใฎ็ตŒ่ทฏใ‚’่ธใ‚“ใงใ‚‹ใ‹ใ€ใŒๅค–้ƒจใซ่ชฌๆ˜Žใ—ใฅใ‚‰ใใชใ‚‹ใฎใงใ€ +ใƒฌใƒ“ใƒฅใƒผไพ้ ผใฏ โ€œๅœง็ธฎใƒ‘ใ‚ฑใƒƒใƒˆโ€ ใ‚’ๆญฃใจใ—ใฆไฝฟใ†: + +- ็”Ÿๆˆ: `scripts/make_chatgpt_pro_packet_free_path.sh` +- ใ‚นใƒŠใƒƒใƒ—ใ‚ทใƒงใƒƒใƒˆ: `docs/analysis/FREE_PATH_REVIEW_PACKET_CHATGPT.md` diff --git a/hakmem.d b/hakmem.d index 2dbae618..b39d3c62 100644 --- a/hakmem.d +++ b/hakmem.d @@ -198,6 +198,8 @@ hakmem.o: core/hakmem.c core/hakmem.h core/hakmem_build_flags.h \ core/box/../front/../box/free_cold_shape_stats_box.h \ core/box/../front/../box/free_tiny_fast_mono_dualhot_env_box.h \ core/box/../front/../box/free_tiny_fast_mono_legacy_direct_env_box.h \ + core/box/../front/../box/free_path_commit_once_fixed_box.h \ + core/box/../front/../box/free_path_legacy_mask_box.h \ core/box/../front/../box/alloc_passdown_ssot_env_box.h \ core/box/tiny_alloc_gate_box.h core/box/tiny_route_box.h \ core/box/tiny_alloc_gate_shape_env_box.h \ @@ -489,6 +491,8 @@ core/box/../front/../box/free_cold_shape_env_box.h: core/box/../front/../box/free_cold_shape_stats_box.h: core/box/../front/../box/free_tiny_fast_mono_dualhot_env_box.h: core/box/../front/../box/free_tiny_fast_mono_legacy_direct_env_box.h: +core/box/../front/../box/free_path_commit_once_fixed_box.h: +core/box/../front/../box/free_path_legacy_mask_box.h: core/box/../front/../box/alloc_passdown_ssot_env_box.h: core/box/tiny_alloc_gate_box.h: core/box/tiny_route_box.h: diff --git a/scripts/make_chatgpt_pro_packet_free_path.sh b/scripts/make_chatgpt_pro_packet_free_path.sh new file mode 100755 index 00000000..34a18c1a --- /dev/null +++ b/scripts/make_chatgpt_pro_packet_free_path.sh @@ -0,0 +1,127 @@ +#!/usr/bin/env bash +set -euo pipefail + +# Generate a compact "free-path review packet" for sharing with ChatGPT Pro. +# Output: Markdown to stdout (copy/paste). +# +# Usage: +# scripts/make_chatgpt_pro_packet_free_path.sh > /tmp/free_path_packet.md +# +# Notes: +# - Extracts key functions with a simple brace counter. +# - Clips each snippet to keep it shareable. + +root_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" +cd "${root_dir}" + +# Default clip is intentionally small; you can override via CLIP_LINES=... +clip="${CLIP_LINES:-160}" + +need() { command -v "$1" >/dev/null 2>&1 || { echo "[packet] missing $1" >&2; exit 1; }; } +need awk +need sed + +extract_func_n_clip() { + local file="$1" + local re="$2" + local nth="$3" + local clip_lines="$4" + + awk -v re="${re}" -v nth="${nth}" ' + function count_char(s, c, i,n) { n=0; for (i=1;i<=length(s);i++) if (substr(s,i,1)==c) n++; return n } + BEGIN { hit=0; started=0; depth=0; seen_open=0 } + { + if (!started) { + if ($0 ~ re) { + hit++; + if (hit == nth) { + started=1; + } + } + } + if (started) { + print $0; + depth += count_char($0, "{"); + if (count_char($0, "{") > 0) seen_open=1; + depth -= count_char($0, "}"); + if (seen_open && depth <= 0) exit 0; + } + } + ' "${file}" | sed -n "1,${clip_lines}p" +} + +extract_func() { + extract_func_n_clip "$1" "$2" 1 "${clip}" +} + +md_code() { + local lang="$1" + local file="$2" + echo "" + echo "### \`${file}\`" + echo "\`\`\`${lang}" + cat + echo "\`\`\`" +} + +cat <<'MD' +# Hakmem free-path review packet (compact) + +Goal: understand remaining fixed costs vs mimalloc/tcmalloc, with Box Theory (single boundary, reversible ENV gates). + +SSOT bench conditions (current practice): +- `HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE` +- `ITERS=20000000 WS=400 RUNS=10` +- run via `scripts/run_mixed_10_cleanenv.sh` + +Request: +1) Where is the dominant fixed cost on free path now? +2) What structural change would give +5โ€“10% without breaking Box Theory? +3) What NOT to do (layout tax pitfalls)? +MD + +echo "" +echo "## Code excerpts (clipped)" + +# We focus on the hot tiny-free pipeline (the most actionable for instruction/branch work). +# If the reviewer needs wrapper/registry code too, we can provide a larger packet. + +# A) tiny_free_gate_try_fast(): user_ptr -> class_idx/base -> tiny_hot_free_fast()/fallback +extract_func core/box/tiny_free_gate_box.h '^static inline int tiny_free_gate_try_fast\\(void\\* user_ptr\\)' | md_code c core/box/tiny_free_gate_box.h + +# B) free_tiny_fast(): main Tiny free dispatcher (hot/cold + env snapshot) +extract_func_n_clip core/front/malloc_tiny_fast.h '^static inline int free_tiny_fast\\(void\\* ptr\\)' 1 220 | md_code c core/front/malloc_tiny_fast.h + +# C) tiny_hot_free_fast(): TLS unified cache push +extract_func core/box/tiny_front_hot_box.h '^static inline int tiny_hot_free_fast\\(int class_idx, void\\* base\\)' | md_code c core/box/tiny_front_hot_box.h + +# D) tiny_legacy_fallback_free_base_with_env(): inline-slots cascade + unified_cache_push(_fast) +extract_func_n_clip core/box/tiny_legacy_fallback_box.h '^static inline void tiny_legacy_fallback_free_base_with_env\\(void\\* base, uint32_t class_idx, const HakmemEnvSnapshot\\* env\\)' 1 260 | md_code c core/box/tiny_legacy_fallback_box.h + +cat <<'MD' + +## Questions to answer (please be concrete) + +1) In these snippets, which checks/branches are still "per-op fixed taxes" on the hot free path? + - Please point to specific lines/conditions and estimate cost (branches/instructions or dependency chain). + +2) Is `tiny_hot_free_fast()` already close to optimal, and the real bottleneck is upstream (user->base/classify/route)? + - If yes, whatโ€™s the smallest structural refactor that removes that upstream fixed tax? + +3) Should we introduce a "commit once" plan (freeze the chosen free path) โ€” or is branch prediction already making lazy-init checks ~free here? + - If "commit once", where should it live to avoid runtime gate overhead (bench_profile refresh boundary vs per-op)? + +4) We have had many layout-tax regressions from code removal/reordering. + - What patterns here are most likely to trigger layout tax if changed? + - How would you stage a safe A/B (same binary, ENV toggle) for your proposal? + +5) If you could change just ONE of: + - pointer classification to base/class_idx, + - route determination, + - unified cache push/pop structure, + which is highest ROI for +5โ€“10% on WS=400? + +MD + +echo "" +echo "[packet] done"