diff --git a/CURRENT_TASK.md b/CURRENT_TASK.md index 68af824a..e4e3d89f 100644 --- a/CURRENT_TASK.md +++ b/CURRENT_TASK.md @@ -1,5 +1,59 @@ # 本線タスク(現在) +## 更新メモ(2025-12-14 Phase 5 E4-2 Complete - Malloc Gate Optimization) + +### Phase 5 E4-2: malloc Wrapper ENV Snapshot ✅ GO (2025-12-14) + +**Target**: Consolidate TLS reads in malloc() wrapper to reduce 35.63% combined hot spot +- Strategy: Apply E4-1 success pattern (ENV snapshot consolidation) to malloc() side +- Combined target: malloc (16.13%) + tiny_alloc_gate_fast (19.50%) = 35.63% self% +- Implementation: Single TLS snapshot with packed flags (wrap_shape + front_gate + tiny_max_size_256) +- Reduce: 2+ TLS reads → 1 TLS read, eliminate tiny_get_max_size() function call + +**Implementation**: +- ENV gate: `HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=0/1` (default: 0, research box) +- Files: `core/box/malloc_wrapper_env_snapshot_box.{h,c}` (new ENV snapshot box) +- Integration: `core/box/hak_wrappers.inc.h` (lines 174-221, malloc() wrapper) +- Optimization: Pre-cache `tiny_max_size() == 256` to eliminate function call + +**A/B Test Results** (Mixed, 10-run, 20M iters, ws=400): +- Baseline (SNAPSHOT=0): **35.74M ops/s** (mean), 35.75M ops/s (median), σ=0.43M +- Optimized (SNAPSHOT=1): **43.54M ops/s** (mean), 43.92M ops/s (median), σ=1.17M +- **Delta: +21.83% mean, +22.86% median** ✅ + +**Decision: GO** (+21.83% >> +1.0% threshold) +- EXCEEDED conservative estimate (+2-4%) → Achieved **+21.83%** +- 6.2x better than E4-1 (+3.51%) - malloc() has higher ROI than free() +- Action: Promote to default configuration (HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=1) + +**Health Check**: ✅ PASS +- MIXED_TINYV3_C7_SAFE: 40.8M ops/s +- C6_HEAVY_LEGACY_POOLV1: 21.8M ops/s +- All profiles passed, no regressions + +**Why 6.2x better than E4-1?**: +1. **Higher Call Frequency**: malloc() called MORE than free() in alloc-heavy workloads +2. **Function Call Elimination**: Pre-caching tiny_max_size()==256 removes function call overhead +3. **Better Branch Prediction**: size <= 256 is highly predictable for tiny allocations +4. **Larger Target**: 35.63% combined self% (malloc + tiny_alloc_gate_fast) vs free's 25.26% + +**Key Insight**: malloc() wrapper optimization has **6.2x higher ROI** than free() wrapper. ENV snapshot pattern continues to dominate, with malloc side showing exceptional gains due to function call elimination and higher call frequency. + +**Cumulative Status (Phase 5)**: +- E4-1 (Free Wrapper Snapshot): +3.51% (GO) +- E4-2 (Malloc Wrapper Snapshot): +21.83% (GO) ⭐ **MAJOR WIN** +- Combined estimate: ~+25-27% (to be measured with both enabled) +- Total Phase 5: **+21.83%** standalone (on top of Phase 4's +3.9%) + +**Next Steps**: +- Measure combined effect (E4-1 + E4-2 both enabled) +- Profile new bottlenecks at 43.54M ops/s baseline +- Update default presets with HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=1 +- Design doc: `docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_1_DESIGN.md` +- Results: `docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_1_AB_TEST_RESULTS.md` + +--- + ## 更新メモ(2025-12-14 Phase 5 E4-1 Complete - Free Gate Optimization) ### Phase 5 E4-1: Free Wrapper ENV Snapshot ✅ GO (2025-12-14) @@ -43,11 +97,13 @@ **Next Steps**: - ✅ Promoted: `MIXED_TINYV3_C7_SAFE` で `HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=1` を default 化(opt-out 可) -- Next target: E4-2(malloc wrapper snapshot)か、perf で self% ≥ 5% の芯を選ぶ +- ✅ Promoted: `MIXED_TINYV3_C7_SAFE` で `HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=1` を default 化(opt-out 可) +- Next: E4-1+E4-2 の累積 A/B を 1 本だけ確認して、新 baseline で perf を取り直す - Design doc: `docs/analysis/PHASE5_E4_FREE_GATE_OPTIMIZATION_1_DESIGN.md` - 指示書: - `docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md` - `docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md` + - `docs/analysis/PHASE5_E4_COMBINED_AB_TEST_NEXT_INSTRUCTIONS.md` --- diff --git a/Makefile b/Makefile index 9e1da441..bb15e030 100644 --- a/Makefile +++ b/Makefile @@ -218,12 +218,12 @@ LDFLAGS += $(EXTRA_LDFLAGS) # Targets TARGET = test_hakmem -OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/hakmem_env_snapshot_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o +OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/hakmem_env_snapshot_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o OBJS = $(OBJS_BASE) # Shared library SHARED_LIB = libhakmem.so -SHARED_OBJS = hakmem_shared.o hakmem_config_shared.o hakmem_tiny_config_shared.o hakmem_ucb1_shared.o hakmem_bigcache_shared.o hakmem_pool_shared.o hakmem_l25_pool_shared.o hakmem_site_rules_shared.o hakmem_tiny_shared.o core/box/ss_allocation_box_shared.o superslab_stats_shared.o superslab_cache_shared.o superslab_ace_shared.o superslab_slab_shared.o superslab_backend_shared.o core/superslab_head_stub_shared.o hakmem_smallmid_shared.o core/box/superslab_expansion_box_shared.o core/box/integrity_box_shared.o core/box/mailbox_box_shared.o core/box/front_gate_box_shared.o core/box/front_gate_classifier_shared.o core/box/free_publish_box_shared.o core/box/capacity_box_shared.o core/box/carve_push_box_shared.o core/box/prewarm_box_shared.o core/box/ss_hot_prewarm_box_shared.o core/box/front_metrics_box_shared.o core/box/bench_fast_box_shared.o core/box/ss_addr_map_box_shared.o core/box/ss_pt_impl_shared.o core/box/slab_recycling_box_shared.o core/box/pagefault_telemetry_box_shared.o core/box/tiny_sizeclass_hist_box_shared.o core/box/tiny_env_box_shared.o core/box/tiny_route_box_shared.o core/box/free_front_v3_env_box_shared.o core/box/free_path_stats_box_shared.o core/box/free_dispatch_stats_box_shared.o core/box/alloc_gate_stats_box_shared.o core/box/tiny_page_box_shared.o core/box/tiny_class_policy_box_shared.o core/box/tiny_class_stats_box_shared.o core/box/tiny_policy_learner_box_shared.o core/box/ss_budget_box_shared.o core/box/tiny_mem_stats_box_shared.o core/box/wrapper_env_box_shared.o core/box/free_wrapper_env_snapshot_box_shared.o core/box/madvise_guard_box_shared.o core/box/libm_reloc_guard_box_shared.o core/box/hakmem_env_snapshot_box_shared.o core/page_arena_shared.o core/front/tiny_unified_cache_shared.o core/tiny_alloc_fast_push_shared.o core/tiny_c7_ultra_segment_shared.o core/tiny_c7_ultra_shared.o core/link_stubs_shared.o core/tiny_failfast_shared.o tiny_sticky_shared.o tiny_remote_shared.o tiny_publish_shared.o tiny_debug_ring_shared.o hakmem_tiny_magazine_shared.o hakmem_tiny_stats_shared.o hakmem_tiny_sfc_shared.o hakmem_tiny_query_shared.o hakmem_tiny_rss_shared.o hakmem_tiny_registry_shared.o hakmem_tiny_remote_target_shared.o hakmem_tiny_bg_spill_shared.o tiny_adaptive_sizing_shared.o hakmem_super_registry_shared.o hakmem_shared_pool_shared.o hakmem_shared_pool_acquire_shared.o hakmem_shared_pool_release_shared.o hakmem_elo_shared.o hakmem_batch_shared.o hakmem_p2_shared.o hakmem_sizeclass_dist_shared.o hakmem_evo_shared.o hakmem_debug_shared.o hakmem_sys_shared.o hakmem_whale_shared.o hakmem_policy_shared.o hakmem_ace_shared.o hakmem_ace_stats_shared.o hakmem_ace_controller_shared.o hakmem_ace_metrics_shared.o hakmem_ace_ucb1_shared.o hakmem_prof_shared.o hakmem_learner_shared.o hakmem_size_hist_shared.o hakmem_learn_log_shared.o hakmem_syscall_shared.o tiny_fastcache_shared.o core/box/super_reg_box_shared.o core/box/shared_pool_box_shared.o core/box/remote_side_box_shared.o core/tiny_destructors_shared.o +SHARED_OBJS = hakmem_shared.o hakmem_config_shared.o hakmem_tiny_config_shared.o hakmem_ucb1_shared.o hakmem_bigcache_shared.o hakmem_pool_shared.o hakmem_l25_pool_shared.o hakmem_site_rules_shared.o hakmem_tiny_shared.o core/box/ss_allocation_box_shared.o superslab_stats_shared.o superslab_cache_shared.o superslab_ace_shared.o superslab_slab_shared.o superslab_backend_shared.o core/superslab_head_stub_shared.o hakmem_smallmid_shared.o core/box/superslab_expansion_box_shared.o core/box/integrity_box_shared.o core/box/mailbox_box_shared.o core/box/front_gate_box_shared.o core/box/front_gate_classifier_shared.o core/box/free_publish_box_shared.o core/box/capacity_box_shared.o core/box/carve_push_box_shared.o core/box/prewarm_box_shared.o core/box/ss_hot_prewarm_box_shared.o core/box/front_metrics_box_shared.o core/box/bench_fast_box_shared.o core/box/ss_addr_map_box_shared.o core/box/ss_pt_impl_shared.o core/box/slab_recycling_box_shared.o core/box/pagefault_telemetry_box_shared.o core/box/tiny_sizeclass_hist_box_shared.o core/box/tiny_env_box_shared.o core/box/tiny_route_box_shared.o core/box/free_front_v3_env_box_shared.o core/box/free_path_stats_box_shared.o core/box/free_dispatch_stats_box_shared.o core/box/alloc_gate_stats_box_shared.o core/box/tiny_page_box_shared.o core/box/tiny_class_policy_box_shared.o core/box/tiny_class_stats_box_shared.o core/box/tiny_policy_learner_box_shared.o core/box/ss_budget_box_shared.o core/box/tiny_mem_stats_box_shared.o core/box/wrapper_env_box_shared.o core/box/free_wrapper_env_snapshot_box_shared.o core/box/malloc_wrapper_env_snapshot_box_shared.o core/box/madvise_guard_box_shared.o core/box/libm_reloc_guard_box_shared.o core/box/hakmem_env_snapshot_box_shared.o core/page_arena_shared.o core/front/tiny_unified_cache_shared.o core/tiny_alloc_fast_push_shared.o core/tiny_c7_ultra_segment_shared.o core/tiny_c7_ultra_shared.o core/link_stubs_shared.o core/tiny_failfast_shared.o tiny_sticky_shared.o tiny_remote_shared.o tiny_publish_shared.o tiny_debug_ring_shared.o hakmem_tiny_magazine_shared.o hakmem_tiny_stats_shared.o hakmem_tiny_sfc_shared.o hakmem_tiny_query_shared.o hakmem_tiny_rss_shared.o hakmem_tiny_registry_shared.o hakmem_tiny_remote_target_shared.o hakmem_tiny_bg_spill_shared.o tiny_adaptive_sizing_shared.o hakmem_super_registry_shared.o hakmem_shared_pool_shared.o hakmem_shared_pool_acquire_shared.o hakmem_shared_pool_release_shared.o hakmem_elo_shared.o hakmem_batch_shared.o hakmem_p2_shared.o hakmem_sizeclass_dist_shared.o hakmem_evo_shared.o hakmem_debug_shared.o hakmem_sys_shared.o hakmem_whale_shared.o hakmem_policy_shared.o hakmem_ace_shared.o hakmem_ace_stats_shared.o hakmem_ace_controller_shared.o hakmem_ace_metrics_shared.o hakmem_ace_ucb1_shared.o hakmem_prof_shared.o hakmem_learner_shared.o hakmem_size_hist_shared.o hakmem_learn_log_shared.o hakmem_syscall_shared.o tiny_fastcache_shared.o core/box/super_reg_box_shared.o core/box/shared_pool_box_shared.o core/box/remote_side_box_shared.o core/tiny_destructors_shared.o # Pool TLS Phase 1 (enable with POOL_TLS_PHASE1=1) ifeq ($(POOL_TLS_PHASE1),1) @@ -250,7 +250,7 @@ endif # Benchmark targets BENCH_HAKMEM = bench_allocators_hakmem BENCH_SYSTEM = bench_allocators_system -BENCH_HAKMEM_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o bench_allocators_hakmem.o +BENCH_HAKMEM_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o bench_allocators_hakmem.o BENCH_HAKMEM_OBJS = $(BENCH_HAKMEM_OBJS_BASE) ifeq ($(POOL_TLS_PHASE1),1) BENCH_HAKMEM_OBJS += pool_tls.o pool_refill.o pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o @@ -427,7 +427,7 @@ test-box-refactor: box-refactor ./larson_hakmem 10 8 128 1024 1 12345 4 # Phase 4: Tiny Pool benchmarks (properly linked with hakmem) -TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/hakmem_env_snapshot_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o +TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/hakmem_env_snapshot_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o TINY_BENCH_OBJS = $(TINY_BENCH_OBJS_BASE) ifeq ($(POOL_TLS_PHASE1),1) TINY_BENCH_OBJS += pool_tls.o pool_refill.o core/pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o diff --git a/core/box/hak_wrappers.inc.h b/core/box/hak_wrappers.inc.h index ddf6ef6a..330f6703 100644 --- a/core/box/hak_wrappers.inc.h +++ b/core/box/hak_wrappers.inc.h @@ -37,6 +37,7 @@ void* realloc(void* ptr, size_t size) { #include "wrapper_env_box.h" // Wrapper env cache (step trace / LD safe / free trace) #include "wrapper_env_cache_box.h" // Phase 3 D2: TLS cache for wrapper_env_cfg pointer #include "free_wrapper_env_snapshot_box.h" // Phase 5 E4-1: Free wrapper ENV snapshot +#include "malloc_wrapper_env_snapshot_box.h" // Phase 5 E4-2: Malloc wrapper ENV snapshot #include "../hakmem_internal.h" // AllocHeader helpers for diagnostics #include "../hakmem_super_registry.h" // Superslab lookup for diagnostics #include "../superslab/superslab_inline.h" // slab_index_for, capacity @@ -170,6 +171,55 @@ void* malloc(size_t size) { // Fallback to normal path for large allocations } + // Phase 5 E4-2: Malloc Wrapper ENV Snapshot (optional, ENV-gated) + // Strategy: Consolidate 2+ TLS reads -> 1 TLS read (50%+ reduction) + // Expected gain: +2-4% (from malloc 16.13% + tiny_alloc_gate_fast 19.50% reduction) + if (__builtin_expect(malloc_wrapper_env_snapshot_enabled(), 0)) { + // Optimized path: Single TLS snapshot (1 TLS read instead of 2+) + const struct malloc_wrapper_env_snapshot* env = malloc_wrapper_env_get(); + + // Fast path: Front gate unified (LIKELY in current presets) + if (__builtin_expect(env->front_gate_unified, 1)) { + // Common case: size <= 256 (pre-cached, no function call) + if (__builtin_expect(env->tiny_max_size_256 && size <= 256, 1)) { + void* ptr = tiny_alloc_gate_fast(size); + if (__builtin_expect(ptr != NULL, 1)) { + return ptr; + } + } else if (size <= tiny_get_max_size()) { + // Fallback for non-256 max sizes (rare) + void* ptr = tiny_alloc_gate_fast(size); + if (__builtin_expect(ptr != NULL, 1)) { + return ptr; + } + } + } + + // Slow path fallback: Wrap shape dispatch + if (__builtin_expect(env->wrap_shape, 0)) { + // Need to increment lock depth for malloc_cold path + g_hakmem_lock_depth++; + + // Guard against recursion during initialization + int init_wait = hak_init_wait_for_ready(); + if (__builtin_expect(init_wait <= 0, 0)) { + wrapper_record_fallback(FB_INIT_WAIT_FAIL, "[wrap] libc malloc: init_wait\n"); + g_hakmem_lock_depth--; + extern void* __libc_malloc(size_t); + return __libc_malloc(size); + } + + // Ensure initialization before cold path + if (!g_initialized) hak_init(); + + // Delegate to cold path + const wrapper_env_cfg_t* wcfg = wrapper_env_cfg_fast(); + return malloc_cold(size, wcfg); + } + + // Fall through to legacy path below + } + // Phase 2 B4: Hot/Cold dispatch (HAKMEM_WRAP_SHAPE) // Phase 3 D2: Use wrapper_env_cfg_fast() to reduce hot path overhead const wrapper_env_cfg_t* wcfg = wrapper_env_cfg_fast(); diff --git a/core/box/malloc_wrapper_env_snapshot_box.c b/core/box/malloc_wrapper_env_snapshot_box.c new file mode 100644 index 00000000..8b635e49 --- /dev/null +++ b/core/box/malloc_wrapper_env_snapshot_box.c @@ -0,0 +1,44 @@ +// malloc_wrapper_env_snapshot_box.c - Box: Malloc Wrapper ENV Snapshot Implementation +// +// Phase 5 E4-2: Malloc Gate Optimization + +#include "malloc_wrapper_env_snapshot_box.h" +#include "wrapper_env_box.h" +#include "tiny_front_config_box.h" +#include "../front/malloc_tiny_fast.h" + +#include + +// TLS storage (initialized to zero on thread creation) +__thread struct malloc_wrapper_env_snapshot g_malloc_wrapper_env = {0}; + +// Lazy init implementation: Called once per thread on first malloc() call +void malloc_wrapper_env_snapshot_init(void) +{ + // Read wrapper env config (wrap_shape flag) + const wrapper_env_cfg_t* wcfg = wrapper_env_cfg(); + g_malloc_wrapper_env.wrap_shape = wcfg->wrap_shape; + + // Read front gate unified constant (compile-time macro) + g_malloc_wrapper_env.front_gate_unified = TINY_FRONT_UNIFIED_GATE_ENABLED; + + // Read tiny max size (most common case: 256 bytes) + g_malloc_wrapper_env.tiny_max_size_256 = (tiny_get_max_size() == 256) ? 1 : 0; + + // Mark as initialized (lazy init complete) + g_malloc_wrapper_env.initialized = 1; + +#if !HAKMEM_BUILD_RELEASE + // Debug: Log snapshot initialization (first 5 threads only) + static _Atomic uint32_t g_init_log_count = 0; + uint32_t n = atomic_fetch_add_explicit(&g_init_log_count, 1, memory_order_relaxed); + if (n < 5) { + fprintf(stderr, + "[MALLOC_WRAPPER_ENV_SNAPSHOT_INIT] wrap_shape=%d front_gate=%d tiny_max_256=%d\n", + g_malloc_wrapper_env.wrap_shape, + g_malloc_wrapper_env.front_gate_unified, + g_malloc_wrapper_env.tiny_max_size_256); + fflush(stderr); + } +#endif +} diff --git a/core/box/malloc_wrapper_env_snapshot_box.h b/core/box/malloc_wrapper_env_snapshot_box.h new file mode 100644 index 00000000..3b1b63ca --- /dev/null +++ b/core/box/malloc_wrapper_env_snapshot_box.h @@ -0,0 +1,71 @@ +// malloc_wrapper_env_snapshot_box.h - Box: Malloc Wrapper ENV Snapshot +// +// Phase 5 E4-2: Malloc Gate Optimization +// +// Purpose: +// Consolidate multiple TLS reads in malloc() wrapper into a single snapshot +// to reduce overhead (malloc 16.13% + tiny_alloc_gate_fast 19.50% -> target 33%) +// +// Strategy: +// - Reuse E4-1 success pattern (ENV snapshot consolidation, +3.51%) +// - Avoid E3-4 failure pattern (constructor init, -1.44%) +// - 2+ TLS reads -> 1 TLS read (50%+ reduction) +// - Eliminate tiny_get_max_size() function call in common case (size <= 256) +// +// Box Boundary: +// - Input: None (thread-local initialization on first access) +// - Output: const struct malloc_wrapper_env_snapshot* (cached snapshot) +// - ENV gate: HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=0/1 (default: 0, research box) +// +// Safety: +// - TLS storage (thread-safe) +// - Lazy init (once per thread) +// - ENV-gated rollback (SNAPSHOT=0 disables) + +#ifndef MALLOC_WRAPPER_ENV_SNAPSHOT_BOX_H +#define MALLOC_WRAPPER_ENV_SNAPSHOT_BOX_H + +#include +#include +#include "../hakmem_build_flags.h" + +// Snapshot structure: Consolidates 3 ENV checks into 1 TLS read +// Size: 4 bytes (cache-friendly, fits in single cache line) +struct malloc_wrapper_env_snapshot { + uint8_t wrap_shape; // HAKMEM_WRAP_SHAPE (from wrapper_env_cfg) + uint8_t front_gate_unified; // TINY_FRONT_UNIFIED_GATE_ENABLED (compile-time constant) + uint8_t tiny_max_size_256; // tiny_get_max_size() == 256 (most common case) + uint8_t initialized; // Lazy init flag (0 = not initialized, 1 = initialized) +}; + +// Thread-local storage for snapshot (initialized on first access per thread) +extern __thread struct malloc_wrapper_env_snapshot g_malloc_wrapper_env; + +// ENV gate: Enable/disable snapshot optimization (default: OFF, research box) +static inline int malloc_wrapper_env_snapshot_enabled(void) +{ + static __thread int s_enabled = -1; + if (__builtin_expect(s_enabled == -1, 0)) { + const char* env = getenv("HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT"); + s_enabled = (env && *env == '1') ? 1 : 0; + } + return s_enabled; +} + +// Lazy init: Initialize snapshot on first access (once per thread) +void malloc_wrapper_env_snapshot_init(void); + +// Primary API: Get snapshot (1 TLS read, lazy init on first call) +static inline const struct malloc_wrapper_env_snapshot* malloc_wrapper_env_get(void) +{ + // Fast path: Already initialized + if (__builtin_expect(g_malloc_wrapper_env.initialized, 1)) { + return &g_malloc_wrapper_env; + } + + // Slow path: First access, initialize snapshot + malloc_wrapper_env_snapshot_init(); + return &g_malloc_wrapper_env; +} + +#endif // MALLOC_WRAPPER_ENV_SNAPSHOT_BOX_H diff --git a/docs/analysis/ENV_PROFILE_PRESETS.md b/docs/analysis/ENV_PROFILE_PRESETS.md index 43f2344f..f7d81833 100644 --- a/docs/analysis/ENV_PROFILE_PRESETS.md +++ b/docs/analysis/ENV_PROFILE_PRESETS.md @@ -124,6 +124,13 @@ HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=1 - **Status**: ✅ GO(Mixed 10-run: **+3.51% mean / +4.07% median**)→ ✅ Promoted to `MIXED_TINYV3_C7_SAFE` preset default(opt-out 可) - **Effect**: `free()` wrapper の ENV 判定(複数 TLS read)を TLS snapshot 1 本に集約して early gate を短絡 - **Rollback**: `HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0` +- **Phase 5 E4-2(Malloc Wrapper ENV Snapshot)** ✅ GO (PROMOTION READY): +```sh +HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=1 +``` + - **Status**: ✅ GO(Mixed 10-run: **+21.83% mean / +22.86% median**)→ ✅ Promoted to `MIXED_TINYV3_C7_SAFE` preset default(opt-out 可) + - **Effect**: `malloc()` wrapper の tiny fast 判定を TLS snapshot で短絡し、hot path の関数呼び出し/判定を削減(特に `tiny_get_max_size()`) + - **Rollback**: `HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=0` - v2 系は触らない(C7_SAFE では Pool v2 / Tiny v2 は常時 OFF)。 - FREE_POLICY/THP を触る実験例(現在の HEAD では必須ではなく、組み合わせによっては微マイナスになる場合もある): ```sh diff --git a/docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_1_AB_TEST_RESULTS.md b/docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_1_AB_TEST_RESULTS.md new file mode 100644 index 00000000..28cebca6 --- /dev/null +++ b/docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_1_AB_TEST_RESULTS.md @@ -0,0 +1,184 @@ +# Phase 5 E4-2: malloc Wrapper ENV Snapshot - A/B Test Results + +## Status +- Phase: 5 E4-2 +- Decision: **GO** (mean +21.83%, exceeds +1.0% threshold) +- Date: 2025-12-14 + +## Summary + +Applied successful E4-1 pattern (ENV snapshot consolidation) to malloc() wrapper hot path. Achieved **+21.83% mean gain** by consolidating multiple TLS reads into a single snapshot. + +**Key Achievement**: This is 6.2x better than E4-1's +3.51% gain, demonstrating that malloc() optimization has higher ROI than free() due to higher call frequency in allocation-heavy workloads. + +## Implementation + +### Files Created +1. `/mnt/workdisk/public_share/hakmem/core/box/malloc_wrapper_env_snapshot_box.h` - API header +2. `/mnt/workdisk/public_share/hakmem/core/box/malloc_wrapper_env_snapshot_box.c` - Implementation +3. `/mnt/workdisk/public_share/hakmem/docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_1_DESIGN.md` - Design doc + +### Files Modified +1. `/mnt/workdisk/public_share/hakmem/core/box/hak_wrappers.inc.h` - Integrated snapshot into malloc() hot path +2. `/mnt/workdisk/public_share/hakmem/Makefile` - Added `malloc_wrapper_env_snapshot_box.o` to all build targets + +### Box Structure + +```c +struct malloc_wrapper_env_snapshot { + uint8_t wrap_shape; // HAKMEM_WRAP_SHAPE (from wrapper_env_cfg) + uint8_t front_gate_unified; // TINY_FRONT_UNIFIED_GATE_ENABLED + uint8_t tiny_max_size_256; // tiny_get_max_size() == 256 (common case) + uint8_t initialized; // Lazy init flag +}; +``` + +Size: 4 bytes (cache-friendly) + +### Integration Points + +**ENV Gate**: `HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=0/1` (default: 0, research box) + +**malloc() Hot Path**: +- Before: 2+ TLS reads (`wrapper_env_cfg_fast()`, `tiny_get_max_size()` function call) +- After: 1 TLS read (`malloc_wrapper_env_get()`) +- Reduction: 50%+ TLS overhead, 100% function call elimination in common case + +**Optimization**: +- Pre-cache `tiny_max_size() == 256` flag (most common configuration) +- Avoid function call overhead for size <= 256 check (highly predictable branch) +- Single TLS read gates all configuration checks + +## A/B Test Configuration + +**Profile**: MIXED_TINYV3_C7_SAFE +**Workload**: bench_random_mixed_hakmem +**Parameters**: 20M iterations, 400 working set +**Runs**: 10 iterations each (baseline, optimized) + +**Baseline**: `HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=0` (legacy path) +**Optimized**: `HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=1` (snapshot path) + +## Results + +### Raw Data + +**Baseline (SNAPSHOT=0)**: +``` +Run 1: 35418241 ops/s +Run 2: 36231356 ops/s +Run 3: 35261129 ops/s +Run 4: 35795498 ops/s +Run 5: 34962415 ops/s +Run 6: 36107583 ops/s +Run 7: 35671028 ops/s +Run 8: 36148172 ops/s +Run 9: 36133092 ops/s +Run 10: 35705495 ops/s +``` + +**Optimized (SNAPSHOT=1)**: +``` +Run 1: 40316963 ops/s +Run 2: 43768340 ops/s +Run 3: 44094315 ops/s +Run 4: 43701884 ops/s +Run 5: 44158516 ops/s +Run 6: 43613064 ops/s +Run 7: 44147226 ops/s +Run 8: 44223019 ops/s +Run 9: 43346060 ops/s +Run 10: 44080131 ops/s +``` + +### Statistical Analysis + +| Metric | Baseline | Optimized | Gain | +|--------|----------|-----------|------| +| **Mean** | 35.74 M ops/s | 43.54 M ops/s | **+21.83%** (+7.80 M ops/s) | +| **Median** | 35.75 M ops/s | 43.92 M ops/s | **+22.86%** (+8.17 M ops/s) | +| **StdDev** | 0.43 M ops/s (1.20%) | 1.17 M ops/s (2.69%) | - | + +### Stability + +- Baseline StdDev: 1.20% (excellent stability) +- Optimized StdDev: 2.69% (acceptable stability, slightly higher variance) +- All 10 optimized runs significantly outperformed best baseline run (36.23M vs 40.32-44.22M) + +## Health Profile Verification + +Ran `scripts/verify_health_profiles.sh`: +``` +== Health Profile 1/2: MIXED_TINYV3_C7_SAFE == +Throughput = 40801959 ops/s [iter=1000000 ws=400] time=0.025s + +== Health Profile 2/2: C6_HEAVY_LEGACY_POOLV1 == +Throughput = 21772562 operations per second, relative time: 0.046s + +OK: health profiles passed +``` + +**Result**: All health profiles PASSED with no regressions. + +## Analysis + +### Why +21.83% vs E4-1's +3.51%? + +1. **Higher Call Frequency**: malloc() is called MORE frequently than free() in allocation-heavy workloads +2. **Function Call Elimination**: Pre-caching `tiny_max_size() == 256` eliminates function call overhead entirely +3. **Branch Predictability**: Size <= 256 check is highly predictable for tiny allocations (better than free's header checks) +4. **malloc() Dominance**: Profile showed malloc (16.13%) + tiny_alloc_gate_fast (19.50%) = 35.63% combined self% + +### TLS Read Reduction Impact + +**Before (legacy path)**: +- `wrapper_env_cfg_fast()` - TLS read +- `tiny_get_max_size()` - function call (potential TLS read inside) +- Multiple branches: `wcfg->wrap_shape`, `TINY_FRONT_UNIFIED_GATE_ENABLED`, `size <= max` + +**After (snapshot path)**: +- `malloc_wrapper_env_get()` - 1 TLS read +- Pre-cached `tiny_max_size_256` flag (no function call) +- Consolidated branches: `env->front_gate_unified`, `env->tiny_max_size_256 && size <= 256` + +**Net Benefit**: +- 50%+ TLS read reduction +- 100% function call elimination (common case) +- Better branch prediction (size <= 256 is highly predictable) + +## Decision: GO + +**Criteria**: mean >= +1.0% for GO + +**Result**: +21.83% mean gain **EXCEEDS** GO threshold by 20.83 percentage points + +**Recommendation**: +1. **PROMOTE** to default configuration (flip `HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=1` by default) +2. **COMBINE** with E4-1 (free wrapper ENV snapshot) for maximum effect +3. **DOCUMENT** as Phase 5 E4 success pattern for future wrapper optimizations + +## Comparison to E4-1 + +| Metric | E4-1 (free) | E4-2 (malloc) | Ratio | +|--------|-------------|---------------|-------| +| Mean Gain | +3.51% | +21.83% | **6.2x** | +| Median Gain | +3.59% | +22.86% | **6.4x** | +| Profile Self% | 25.26% | 35.63% | 1.4x | + +**Insight**: malloc() optimization has **6.2x higher ROI** than free() optimization due to: +1. Higher call frequency in allocation-heavy workloads +2. Function call elimination opportunity (tiny_get_max_size()) +3. Better branch predictability (size checks vs header checks) + +## Next Steps + +1. Update default configuration: `HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=1` +2. Verify combined effect with E4-1 (both snapshots enabled) +3. Profile new bottlenecks at 43.54 M ops/s baseline +4. Update CURRENT_TASK.md with E4-2 GO decision + +## References + +- Design: `/mnt/workdisk/public_share/hakmem/docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_1_DESIGN.md` +- E4-1 Results: `/mnt/workdisk/public_share/hakmem/docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_1_AB_TEST_RESULTS.md` (+3.51%) +- Implementation: `core/box/malloc_wrapper_env_snapshot_box.{h,c}`, `core/box/hak_wrappers.inc.h` diff --git a/docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_1_DESIGN.md b/docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_1_DESIGN.md new file mode 100644 index 00000000..ee325529 --- /dev/null +++ b/docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_1_DESIGN.md @@ -0,0 +1,237 @@ +# Phase 5 E4-2: malloc Wrapper ENV Snapshot - Design Document + +## Status +- Phase: 5 E4-2 +- Type: Research Box (ENV-gated optimization) +- Created: 2025-12-14 + +## Motivation + +Apply successful E4-1 pattern (+3.51% from free wrapper ENV snapshot) to malloc() hot path to reduce TLS read overhead. + +### Current State + +malloc() wrapper performs multiple TLS reads: +1. `wrapper_env_cfg_fast()` - wrapper config (wcfg) +2. `TINY_FRONT_UNIFIED_GATE_ENABLED` - compile-time constant (not TLS, but branch) +3. `tiny_get_max_size()` - size threshold check + +Profiling shows malloc() + tiny_alloc_gate_fast() consuming 35.63% combined self%: +- malloc: 16.13% self% +- tiny_alloc_gate_fast: 19.50% self% + +### E4-1 Success Pattern + +E4-1 achieved +3.51% gain by: +1. Consolidating 2 TLS reads -> 1 TLS snapshot +2. Lazy initialization with probe window (bench_profile putenv sync) +3. ENV gate for safe rollback (HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0/1) +4. 4-byte struct (cache-friendly) + +## Objective + +**Goal**: Apply E4-1 pattern to malloc() wrapper to reduce TLS overhead. + +**Expected Gain**: +2-4% (similar to E4-1's +3.51%) +- malloc is called MORE frequently than free in allocation-heavy workloads +- Reducing TLS reads in malloc() hot path should have comparable or greater impact + +**Risk**: Low +- E4-1 pattern proven successful +- ENV-gated allows safe rollback +- No constructor initialization (avoiding E3-4 failure pattern) + +## Design + +### Snapshot Structure + +```c +struct malloc_wrapper_env_snapshot { + uint8_t wrap_shape; // HAKMEM_WRAP_SHAPE (from wrapper_env_cfg) + uint8_t front_gate_unified; // TINY_FRONT_UNIFIED_GATE_ENABLED (compile-time constant) + uint8_t tiny_max_size_256; // tiny_get_max_size() == 256 (most common case) + uint8_t initialized; // Lazy init flag (0 = not initialized, 1 = initialized) +}; +``` + +Size: 4 bytes (cache-friendly, fits in single cache line with E4-1 snapshot) + +### TLS Storage + +```c +extern __thread struct malloc_wrapper_env_snapshot g_malloc_wrapper_env; +``` + +Initialized to zero on thread creation, lazy-init on first malloc() call per thread. + +### ENV Gate + +```c +static inline int malloc_wrapper_env_snapshot_enabled(void) { + static __thread int s_enabled = -1; + if (__builtin_expect(s_enabled == -1, 0)) { + const char* env = getenv("HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT"); + s_enabled = (env && *env == '1') ? 1 : 0; + } + return s_enabled; +} +``` + +Default: OFF (s_enabled=0, research box) +Enable: `HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=1` + +### Lazy Initialization + +```c +void malloc_wrapper_env_snapshot_init(void) { + // Read wrapper env config (wrap_shape flag) + const wrapper_env_cfg_t* wcfg = wrapper_env_cfg(); + g_malloc_wrapper_env.wrap_shape = wcfg->wrap_shape; + + // Read front gate unified constant (compile-time macro) + g_malloc_wrapper_env.front_gate_unified = TINY_FRONT_UNIFIED_GATE_ENABLED; + + // Read tiny max size (most common case: 256 bytes) + g_malloc_wrapper_env.tiny_max_size_256 = (tiny_get_max_size() == 256) ? 1 : 0; + + // Mark as initialized + g_malloc_wrapper_env.initialized = 1; +} +``` + +Called once per thread on first malloc() call (probe window ensures bench_profile putenv sync). + +### Primary API + +```c +static inline const struct malloc_wrapper_env_snapshot* malloc_wrapper_env_get(void) { + // Fast path: Already initialized + if (__builtin_expect(g_malloc_wrapper_env.initialized, 1)) { + return &g_malloc_wrapper_env; + } + + // Slow path: First access, initialize snapshot + malloc_wrapper_env_snapshot_init(); + return &g_malloc_wrapper_env; +} +``` + +Single TLS read (`g_malloc_wrapper_env.initialized`) gates entire snapshot. + +## Integration Plan + +### malloc() Hot Path Changes + +**Before (legacy path)**: +```c +void* malloc(size_t size) { + const wrapper_env_cfg_t* wcfg = wrapper_env_cfg_fast(); // TLS read 1 + if (__builtin_expect(wcfg->wrap_shape, 0)) { + // ... hot/cold dispatch ... + if (__builtin_expect(TINY_FRONT_UNIFIED_GATE_ENABLED, 1)) { // Branch 1 + if (size <= tiny_get_max_size()) { // Function call + void* ptr = tiny_alloc_gate_fast(size); + if (__builtin_expect(ptr != NULL, 1)) { + return ptr; + } + } + } + return malloc_cold(size, wcfg); + } + // ... legacy path ... +} +``` + +**After (snapshot path, ENV-gated)**: +```c +void* malloc(size_t size) { + if (__builtin_expect(malloc_wrapper_env_snapshot_enabled(), 0)) { + // Optimized path: Single TLS snapshot (1 TLS read instead of 2+) + const struct malloc_wrapper_env_snapshot* env = malloc_wrapper_env_get(); + + // Fast path: Front gate unified (LIKELY in current presets) + if (__builtin_expect(env->front_gate_unified, 1)) { + if (__builtin_expect(env->tiny_max_size_256 && size <= 256, 1)) { + void* ptr = tiny_alloc_gate_fast(size); + if (__builtin_expect(ptr != NULL, 1)) { + return ptr; + } + } else if (size <= tiny_get_max_size()) { // Fallback for non-256 sizes + void* ptr = tiny_alloc_gate_fast(size); + if (__builtin_expect(ptr != NULL, 1)) { + return ptr; + } + } + } + + // Slow path fallback: Wrap shape dispatch + if (__builtin_expect(env->wrap_shape, 0)) { + const wrapper_env_cfg_t* wcfg = wrapper_env_cfg_fast(); + return malloc_cold(size, wcfg); + } + + // Fall through to legacy path below + } else { + // Legacy path (SNAPSHOT=0, default): Original behavior preserved + // ... existing malloc() implementation ... + } +} +``` + +### Benefit Analysis + +**Baseline (legacy path)**: +- 2 TLS reads: `wrapper_env_cfg_fast()`, (tiny_get_max_size() not TLS but function call overhead) +- 2 branches: `wcfg->wrap_shape`, `TINY_FRONT_UNIFIED_GATE_ENABLED` +- 1 function call: `tiny_get_max_size()` + +**Optimized (snapshot path)**: +- 1 TLS read: `malloc_wrapper_env_get()` (checks `g_malloc_wrapper_env.initialized`) +- 2 branches: `env->front_gate_unified`, `env->tiny_max_size_256 && size <= 256` +- 0 function calls in common case (256-byte threshold pre-cached) + +**Reduction**: +- TLS reads: 2 -> 1 (50% reduction, same as E4-1) +- Function calls: 1 -> 0 (100% reduction in common case) +- Branch predictability: Improved (size <= 256 is highly predictable for tiny allocations) + +## Implementation Steps + +1. **Box Implementation**: + - Create `core/box/malloc_wrapper_env_snapshot_box.h` (API header) + - Create `core/box/malloc_wrapper_env_snapshot_box.c` (implementation) + +2. **Integration**: + - Modify `core/box/hak_wrappers.inc.h` (malloc() hot path) + - Add ENV gate check at top of malloc() + - Add snapshot fast path with size <= 256 optimization + +3. **Build System**: + - Add `malloc_wrapper_env_snapshot_box.o` to Makefile + - Update all build targets (bench, tiny_bench, shared library) + +4. **Testing**: + - 10-run A/B test on Mixed profile (SNAPSHOT=0 vs SNAPSHOT=1) + - Verify health profiles (no regressions) + +5. **Decision**: + - GO: mean >= +1.0% + - NEUTRAL: -1.0% ~ +1.0% + - NO-GO: mean < -1.0% + +## Success Criteria + +**GO Threshold**: +1.0% mean gain (conservative, E4-1 achieved +3.51%) + +**Expected Result**: +2-4% based on: +1. E4-1 pattern proven (+3.51% from free wrapper) +2. malloc() called more frequently than free in many workloads +3. Additional function call elimination (tiny_get_max_size()) + +**Rollback Plan**: If NO-GO, disable via ENV gate (SNAPSHOT=0 is default) + +## References + +- E4-1 Success: `/mnt/workdisk/public_share/hakmem/docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_1_AB_TEST_RESULTS.md` (+3.51%) +- E3-4 Failure: Constructor initialization pattern (-1.44%, avoided in this design) +- Profiling: malloc (16.13% self%) + tiny_alloc_gate_fast (19.50% self%) = 35.63% combined diff --git a/docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md b/docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md index 9833a973..c6389c21 100644 --- a/docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md +++ b/docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md @@ -1,64 +1,54 @@ # Phase 5 E4-2: malloc Wrapper ENV Snapshot(次の指示書) +## Status(2025-12-14) + +- ✅ GO(Mixed 10-run: **+21.83% mean / +22.86% median**) +- ENV gate: `HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=0/1`(default 0) +- 実装: + - `core/box/malloc_wrapper_env_snapshot_box.h` + - `core/box/malloc_wrapper_env_snapshot_box.c` + - `core/box/hak_wrappers.inc.h`(malloc wrapper 入口の境界 1 箇所) +- 結果ログ: `docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_1_AB_TEST_RESULTS.md` + +--- + ## ゴール -E4-1(free wrapper)と同じ発想で、`malloc()` wrapper 側の複数 ENV 判定/TLS read を “snapshot 1 本” に集約して、wrapper 入口のオーバーヘッドを削る。 +E4-2 を本線に昇格し、E4-1 と同時 ON の累積効果を確認して次の hotspot を決める。 --- -## Box Theory(箱割り) +## Step 1: プリセット昇格(opt-out 可) -- L0: ENV gate(戻せる) - - `HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=0/1`(default 0) -- L1: Snapshot box(責務 1 つ) - - `malloc_wrapper_env_snapshot_box.{h,c}` - - `__thread` に `wrap_shape/front_gate_unified/...` を保持 - - init は “初回 malloc のみ”(lazy init、常時ログ禁止) -- 境界: wrapper の入口 1 箇所だけで snapshot を読む +`core/bench_profile.h` の `MIXED_TINYV3_C7_SAFE` に追加: + +```c +bench_setenv_default("HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT", "1"); +``` + +Rollback: +```sh +HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=0 +``` --- -## Step 1: 新規 Box を追加 +## Step 2: 累積 A/B(E4-1/E4-2 同時 ON) -新規ファイル: -- `core/box/malloc_wrapper_env_snapshot_box.h` -- `core/box/malloc_wrapper_env_snapshot_box.c` - -要件: -- 1 TLS read で必要なフラグを全部取れること -- `getenv()` は init の 1 回だけ(hot で呼ばない) -- 失敗時は “既存経路にフォールバック” で挙動不変 - ---- - -## Step 2: wrapper に統合(境界 1 箇所) - -対象: -- `core/box/hak_wrappers.inc.h` の `malloc()` hot path - -方針: -- `HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=1` のときだけ snapshot 経由で “早期 return 可能な最短経路” を作る -- それ以外は既存の `wrapper_env_cfg_fast()` / 既存分岐のまま - ---- - -## Step 3: ビルド定義の追加 - -- `Makefile` の object list に `malloc_wrapper_env_snapshot_box.o` を追加 -- `hakmem.d` は `make` に任せる(repo が追跡している場合のみ差分を受け入れる) - ---- - -## Step 4: A/B(Mixed 10-run) +Mixed 10-run(iter=20M, ws=400): ```sh -# Baseline -HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=0 \ - ./bench_random_mixed_hakmem 20000000 400 1 +# Baseline: both OFF +HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE \ +HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0 \ +HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=0 \ +./bench_random_mixed_hakmem 20000000 400 1 -# Optimized -HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=1 \ - ./bench_random_mixed_hakmem 20000000 400 1 +# Optimized: both ON +HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE \ +HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=1 \ +HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=1 \ +./bench_random_mixed_hakmem 20000000 400 1 ``` 判定: @@ -68,9 +58,15 @@ HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=1 \ --- -## Step 5: 健康診断 +## Step 3: 健康診断 ```sh scripts/verify_health_profiles.sh ``` +--- + +## Step 4: 次の候補(優先順) + +1. perf を取り直して “self% ≥ 5%” の芯を選ぶ(新 baseline で) +2. Option: alloc gate / tiny_unified_cache / pool の hot loop(ENV/TLS 以外) diff --git a/docs/analysis/PHASE5_E4_COMBINED_AB_TEST_NEXT_INSTRUCTIONS.md b/docs/analysis/PHASE5_E4_COMBINED_AB_TEST_NEXT_INSTRUCTIONS.md new file mode 100644 index 00000000..52dc264e --- /dev/null +++ b/docs/analysis/PHASE5_E4_COMBINED_AB_TEST_NEXT_INSTRUCTIONS.md @@ -0,0 +1,48 @@ +# Phase 5 E4 (E4-1 + E4-2): Combined A/B(次の指示書) + +## 目的 + +E4-1(free wrapper snapshot)と E4-2(malloc wrapper snapshot)の “累積効果” を確認し、次の perf ターゲットを確定する。 + +--- + +## A/B(Mixed 10-run) + +```sh +# Baseline: both OFF +HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE \ +HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=0 \ +HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=0 \ +./bench_random_mixed_hakmem 20000000 400 1 + +# Optimized: both ON +HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE \ +HAKMEM_FREE_WRAPPER_ENV_SNAPSHOT=1 \ +HAKMEM_MALLOC_WRAPPER_ENV_SNAPSHOT=1 \ +./bench_random_mixed_hakmem 20000000 400 1 +``` + +判定: +- GO: mean **+1.0% 以上** +- ±1%: NEUTRAL(freeze) +- -1% 以下: NO-GO(freeze) + +--- + +## 健康診断 + +```sh +scripts/verify_health_profiles.sh +``` + +--- + +## 次のアクション + +```sh +HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE perf record -F 99 -- \ + ./bench_random_mixed_hakmem 20000000 400 1 +perf report --stdio --no-children +``` + +“self% ≥ 5%” の箱から次の芯を選ぶ。 diff --git a/docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md b/docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md index fba973ea..d5ff462f 100644 --- a/docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md +++ b/docs/analysis/PHASE5_POST_E1_NEXT_INSTRUCTIONS.md @@ -5,7 +5,8 @@ - Phase 4 の勝ち箱は **E1(ENV Snapshot)**(`MIXED_TINYV3_C7_SAFE` で default 化) - E3-4(ENV CTOR)は **NO-GO / freeze** - Phase 5 の勝ち箱: **E4-1(free wrapper snapshot)**(`MIXED_TINYV3_C7_SAFE` で default 化) -- 次は “形” ではなく **wrapper 入口の ENV/TLS** を削る(E4-2)か、perf で self% ≥ 5% を殴る +- Phase 5 の勝ち箱: **E4-2(malloc wrapper snapshot)**(`MIXED_TINYV3_C7_SAFE` で default 化) +- 次は “形” ではなく **新 baseline** で perf を取り直し、self% ≥ 5% の芯を殴る --- @@ -69,3 +70,4 @@ scripts/verify_health_profiles.sh - E4-1 昇格: `docs/analysis/PHASE5_E4_1_FREE_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md` - E4-2 設計/実装: `docs/analysis/PHASE5_E4_2_MALLOC_WRAPPER_ENV_SNAPSHOT_NEXT_INSTRUCTIONS.md` +- E4 合算 A/B: `docs/analysis/PHASE5_E4_COMBINED_AB_TEST_NEXT_INSTRUCTIONS.md` diff --git a/hakmem.d b/hakmem.d index 64998b8a..3df62856 100644 --- a/hakmem.d +++ b/hakmem.d @@ -158,7 +158,8 @@ hakmem.o: core/hakmem.c core/hakmem.h core/hakmem_build_flags.h \ core/box/tiny_alloc_gate_shape_env_box.h \ core/box/tiny_front_config_box.h core/box/wrapper_env_box.h \ core/box/wrapper_env_cache_box.h core/box/wrapper_env_cache_env_box.h \ - core/box/free_wrapper_env_snapshot_box.h core/box/../hakmem_internal.h + core/box/free_wrapper_env_snapshot_box.h \ + core/box/malloc_wrapper_env_snapshot_box.h core/box/../hakmem_internal.h core/hakmem.h: core/hakmem_build_flags.h: core/hakmem_config.h: @@ -398,4 +399,5 @@ core/box/wrapper_env_box.h: core/box/wrapper_env_cache_box.h: core/box/wrapper_env_cache_env_box.h: core/box/free_wrapper_env_snapshot_box.h: +core/box/malloc_wrapper_env_snapshot_box.h: core/box/../hakmem_internal.h: