diff --git a/CURRENT_TASK.md b/CURRENT_TASK.md index a432786f..2b597a03 100644 --- a/CURRENT_TASK.md +++ b/CURRENT_TASK.md @@ -224,12 +224,49 @@ Phase 6-10 で達成した累積改善: **Next**: Phase 12 Strategic Pause の次の gap 仮説へ進む -### Next: Phase 14(Pointer Chase Reduction / Tiny tcache) +### Phase 14 v1: Pointer Chase Reduction (tcache-style) — NEUTRAL (+0.20%) ⚠️ RESEARCH BOX -**狙い**: system malloc の tcache に寄せて、Tiny frontend の “配列/FIFO/indirection” コストを減らす。 +**Date**: 2025-12-15 +**Verdict**: **NEUTRAL (+0.20%)** — Frozen as research box (default OFF, manual opt-in) -- 設計: `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_DESIGN.md` -- 指示: `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_NEXT_INSTRUCTIONS.md` +**Target**: Reduce pointer-chase overhead with intrusive LIFO tcache layer (inspired by glibc tcache) + +**Strategy (v1)**: +- Add intrusive LIFO tcache layer (L1) before existing array-based UnifiedCache +- TLS per-class bins (head pointer + count) +- Intrusive next pointers stored in blocks (via tiny_next_store/load SSOT) +- Cap: 64 blocks per class (default, configurable) +- ENV: `HAKMEM_TINY_TCACHE=0/1` (default: 0, OFF) + +**Results (Mixed 10-run)**: +| Case | TCACHE | Mean (ops/s) | Median (ops/s) | Delta | +|------|--------|--------------|----------------|-------| +| A (baseline) | 0 | 51,083,379 | 50,955,866 | — | +| B (optimized) | 1 | 51,186,838 | 51,255,986 | **+0.20%** (mean) / **+0.59%** (median) | + +**Key Findings**: +1. **Mean delta: +0.20%** (below +1.0% GO threshold → NEUTRAL) +2. **Median delta: +0.59%** (slightly better stability, but still NEUTRAL) +3. **Expected ROI (+15-25%) not achieved** on Mixed workload +4. ⚠️ **v1 の統合点が “free 側中心” で、alloc ホットパス(`tiny_hot_alloc_fast()`)が tcache を消費しない** + - 現状: `unified_cache_push()` は tcache に入るが、alloc 側は FIFO(`g_unified_cache[].slots`)のみ → tcache が実質 sink になりやすい + - v1 の A/B は ROI を過小評価する可能性が高い(Phase 14 v2 で通電確認が必要) + +**Possible Reasons for Lower ROI**: +- **Workload mismatch**: Mixed (16–1024B) spans C0-C7, but tcache benefits may be concentrated in hot classes (C2/C3) +- **Existing cache efficiency**: UnifiedCache array access may already be well-cached in L1/L2 +- **Cap too small**: Default cap=64 may cause frequent overflow to array cache +- **Intrusive next overhead**: Writing/reading next pointers may offset pointer-chase reduction + +**Action**: +- ✅ Freeze Phase 14 v1 as research box (default OFF) +- ENV: `HAKMEM_TINY_TCACHE=0/1` (default: 0), `HAKMEM_TINY_TCACHE_CAP=64` +- 📋 Results: `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_AB_TEST_RESULTS.md` +- 📋 Design: `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_DESIGN.md` +- 📋 Instructions: `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_NEXT_INSTRUCTIONS.md` +- 📋 Next (Phase 14 v2): `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_2_NEXT_INSTRUCTIONS.md`(alloc/pop 統合) + +**Future Work**: Consider per-class cap tuning or alternative pointer-chase reduction strategies ## 更新メモ(2025-12-14 Phase 5 E5-3 Analysis - Strategic Pivot) diff --git a/Makefile b/Makefile index ad914f5b..0408602a 100644 --- a/Makefile +++ b/Makefile @@ -218,12 +218,12 @@ LDFLAGS += $(EXTRA_LDFLAGS) # Targets TARGET = test_hakmem -OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o +OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/box/tiny_tcache_env_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o OBJS = $(OBJS_BASE) # Shared library SHARED_LIB = libhakmem.so -SHARED_OBJS = hakmem_shared.o hakmem_config_shared.o hakmem_tiny_config_shared.o hakmem_ucb1_shared.o hakmem_bigcache_shared.o hakmem_pool_shared.o hakmem_l25_pool_shared.o hakmem_site_rules_shared.o hakmem_tiny_shared.o core/box/ss_allocation_box_shared.o superslab_stats_shared.o superslab_cache_shared.o superslab_ace_shared.o superslab_slab_shared.o superslab_backend_shared.o core/superslab_head_stub_shared.o hakmem_smallmid_shared.o core/box/superslab_expansion_box_shared.o core/box/integrity_box_shared.o core/box/mailbox_box_shared.o core/box/front_gate_box_shared.o core/box/front_gate_classifier_shared.o core/box/free_publish_box_shared.o core/box/capacity_box_shared.o core/box/carve_push_box_shared.o core/box/prewarm_box_shared.o core/box/ss_hot_prewarm_box_shared.o core/box/front_metrics_box_shared.o core/box/bench_fast_box_shared.o core/box/ss_addr_map_box_shared.o core/box/ss_pt_impl_shared.o core/box/slab_recycling_box_shared.o core/box/pagefault_telemetry_box_shared.o core/box/tiny_sizeclass_hist_box_shared.o core/box/tiny_env_box_shared.o core/box/tiny_route_box_shared.o core/box/free_front_v3_env_box_shared.o core/box/free_path_stats_box_shared.o core/box/free_dispatch_stats_box_shared.o core/box/alloc_gate_stats_box_shared.o core/box/tiny_page_box_shared.o core/box/tiny_class_policy_box_shared.o core/box/tiny_class_stats_box_shared.o core/box/tiny_policy_learner_box_shared.o core/box/ss_budget_box_shared.o core/box/tiny_mem_stats_box_shared.o core/box/wrapper_env_box_shared.o core/box/free_wrapper_env_snapshot_box_shared.o core/box/malloc_wrapper_env_snapshot_box_shared.o core/box/madvise_guard_box_shared.o core/box/libm_reloc_guard_box_shared.o core/box/hakmem_env_snapshot_box_shared.o core/box/tiny_c7_preserve_header_env_box_shared.o core/page_arena_shared.o core/front/tiny_unified_cache_shared.o core/tiny_alloc_fast_push_shared.o core/tiny_c7_ultra_segment_shared.o core/tiny_c7_ultra_shared.o core/link_stubs_shared.o core/tiny_failfast_shared.o tiny_sticky_shared.o tiny_remote_shared.o tiny_publish_shared.o tiny_debug_ring_shared.o hakmem_tiny_magazine_shared.o hakmem_tiny_stats_shared.o hakmem_tiny_sfc_shared.o hakmem_tiny_query_shared.o hakmem_tiny_rss_shared.o hakmem_tiny_registry_shared.o hakmem_tiny_remote_target_shared.o hakmem_tiny_bg_spill_shared.o tiny_adaptive_sizing_shared.o hakmem_super_registry_shared.o hakmem_shared_pool_shared.o hakmem_shared_pool_acquire_shared.o hakmem_shared_pool_release_shared.o hakmem_elo_shared.o hakmem_batch_shared.o hakmem_p2_shared.o hakmem_sizeclass_dist_shared.o hakmem_evo_shared.o hakmem_debug_shared.o hakmem_sys_shared.o hakmem_whale_shared.o hakmem_policy_shared.o hakmem_ace_shared.o hakmem_ace_stats_shared.o hakmem_ace_controller_shared.o hakmem_ace_metrics_shared.o hakmem_ace_ucb1_shared.o hakmem_prof_shared.o hakmem_learner_shared.o hakmem_size_hist_shared.o hakmem_learn_log_shared.o hakmem_syscall_shared.o tiny_fastcache_shared.o core/box/super_reg_box_shared.o core/box/shared_pool_box_shared.o core/box/remote_side_box_shared.o core/tiny_destructors_shared.o +SHARED_OBJS = hakmem_shared.o hakmem_config_shared.o hakmem_tiny_config_shared.o hakmem_ucb1_shared.o hakmem_bigcache_shared.o hakmem_pool_shared.o hakmem_l25_pool_shared.o hakmem_site_rules_shared.o hakmem_tiny_shared.o core/box/ss_allocation_box_shared.o superslab_stats_shared.o superslab_cache_shared.o superslab_ace_shared.o superslab_slab_shared.o superslab_backend_shared.o core/superslab_head_stub_shared.o hakmem_smallmid_shared.o core/box/superslab_expansion_box_shared.o core/box/integrity_box_shared.o core/box/mailbox_box_shared.o core/box/front_gate_box_shared.o core/box/front_gate_classifier_shared.o core/box/free_publish_box_shared.o core/box/capacity_box_shared.o core/box/carve_push_box_shared.o core/box/prewarm_box_shared.o core/box/ss_hot_prewarm_box_shared.o core/box/front_metrics_box_shared.o core/box/bench_fast_box_shared.o core/box/ss_addr_map_box_shared.o core/box/ss_pt_impl_shared.o core/box/slab_recycling_box_shared.o core/box/pagefault_telemetry_box_shared.o core/box/tiny_sizeclass_hist_box_shared.o core/box/tiny_env_box_shared.o core/box/tiny_route_box_shared.o core/box/free_front_v3_env_box_shared.o core/box/free_path_stats_box_shared.o core/box/free_dispatch_stats_box_shared.o core/box/alloc_gate_stats_box_shared.o core/box/tiny_page_box_shared.o core/box/tiny_class_policy_box_shared.o core/box/tiny_class_stats_box_shared.o core/box/tiny_policy_learner_box_shared.o core/box/ss_budget_box_shared.o core/box/tiny_mem_stats_box_shared.o core/box/wrapper_env_box_shared.o core/box/free_wrapper_env_snapshot_box_shared.o core/box/malloc_wrapper_env_snapshot_box_shared.o core/box/madvise_guard_box_shared.o core/box/libm_reloc_guard_box_shared.o core/box/hakmem_env_snapshot_box_shared.o core/box/tiny_c7_preserve_header_env_box_shared.o core/box/tiny_tcache_env_box_shared.o core/page_arena_shared.o core/front/tiny_unified_cache_shared.o core/tiny_alloc_fast_push_shared.o core/tiny_c7_ultra_segment_shared.o core/tiny_c7_ultra_shared.o core/link_stubs_shared.o core/tiny_failfast_shared.o tiny_sticky_shared.o tiny_remote_shared.o tiny_publish_shared.o tiny_debug_ring_shared.o hakmem_tiny_magazine_shared.o hakmem_tiny_stats_shared.o hakmem_tiny_sfc_shared.o hakmem_tiny_query_shared.o hakmem_tiny_rss_shared.o hakmem_tiny_registry_shared.o hakmem_tiny_remote_target_shared.o hakmem_tiny_bg_spill_shared.o tiny_adaptive_sizing_shared.o hakmem_super_registry_shared.o hakmem_shared_pool_shared.o hakmem_shared_pool_acquire_shared.o hakmem_shared_pool_release_shared.o hakmem_elo_shared.o hakmem_batch_shared.o hakmem_p2_shared.o hakmem_sizeclass_dist_shared.o hakmem_evo_shared.o hakmem_debug_shared.o hakmem_sys_shared.o hakmem_whale_shared.o hakmem_policy_shared.o hakmem_ace_shared.o hakmem_ace_stats_shared.o hakmem_ace_controller_shared.o hakmem_ace_metrics_shared.o hakmem_ace_ucb1_shared.o hakmem_prof_shared.o hakmem_learner_shared.o hakmem_size_hist_shared.o hakmem_learn_log_shared.o hakmem_syscall_shared.o tiny_fastcache_shared.o core/box/super_reg_box_shared.o core/box/shared_pool_box_shared.o core/box/remote_side_box_shared.o core/tiny_destructors_shared.o # Pool TLS Phase 1 (enable with POOL_TLS_PHASE1=1) ifeq ($(POOL_TLS_PHASE1),1) @@ -427,7 +427,7 @@ test-box-refactor: box-refactor ./larson_hakmem 10 8 128 1024 1 12345 4 # Phase 4: Tiny Pool benchmarks (properly linked with hakmem) -TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o +TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/box/tiny_tcache_env_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o TINY_BENCH_OBJS = $(TINY_BENCH_OBJS_BASE) ifeq ($(POOL_TLS_PHASE1),1) TINY_BENCH_OBJS += pool_tls.o pool_refill.o core/pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o diff --git a/core/bench_profile.h b/core/bench_profile.h index a187f8d9..5aa3eded 100644 --- a/core/bench_profile.h +++ b/core/bench_profile.h @@ -11,6 +11,7 @@ #include "box/hakmem_env_snapshot_box.h" // hakmem_env_snapshot_refresh_from_env (Phase 4 E1) #include "box/tiny_free_route_cache_env_box.h" // tiny_free_static_route_refresh_from_env (Phase 8) #include "box/tiny_c7_preserve_header_env_box.h" // tiny_c7_preserve_header_env_refresh_from_env (Phase 13 v1) +#include "box/tiny_tcache_env_box.h" // tiny_tcache_env_refresh_from_env (Phase 14 v1) #endif // env が未設定のときだけ既定値を入れる @@ -187,5 +188,7 @@ static inline void bench_apply_profile(void) { tiny_free_static_route_refresh_from_env(); // Phase 13 v1: Sync C7 preserve header ENV cache after bench_profile putenv defaults. tiny_c7_preserve_header_env_refresh_from_env(); + // Phase 14 v1: Sync tcache ENV cache after bench_profile putenv defaults. + tiny_tcache_env_refresh_from_env(); #endif } diff --git a/core/box/tiny_tcache_box.h b/core/box/tiny_tcache_box.h new file mode 100644 index 00000000..7e6b1f12 --- /dev/null +++ b/core/box/tiny_tcache_box.h @@ -0,0 +1,162 @@ +// ============================================================================ +// Phase 14 v1: Tiny tcache Box (L1) - Intrusive LIFO Cache +// ============================================================================ +// +// Purpose: Per-class intrusive LIFO cache (tcache-style) +// +// Design: docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_DESIGN.md +// +// Strategy: +// - Thread-local head pointers + count per class +// - Intrusive next pointers stored in blocks (via tiny_next_store/load) +// - Cap-limited LIFO (no overflow, delegate to UnifiedCache) +// - Hit path: O(1) pointer operations only (no array access) +// +// Invariants: +// - Only BASE pointers stored (never USER pointers) +// - count <= cap always +// - One block in tcache OR unified_cache, never both +// - Next pointer via tiny_next_store/load SSOT only +// +// API: +// tiny_tcache_try_push(class_idx, base) -> bool (handled?) +// tiny_tcache_try_pop(class_idx) -> void* (BASE or NULL) +// +// Safety: +// - Debug: assert count <= cap +// - Debug: assert base != NULL and reasonable range +// - Release: fast path, no checks +// +// ============================================================================ + +#ifndef TINY_TCACHE_BOX_H +#define TINY_TCACHE_BOX_H + +#include +#include +#include +#include "../hakmem_build_flags.h" +#include "../hakmem_tiny_config.h" // TINY_NUM_CLASSES +#include "../tiny_nextptr.h" // tiny_next_store/load SSOT +#include "tiny_tcache_env_box.h" // tiny_tcache_enabled/cap + +// ============================================================================ +// TLS State (per-thread, per-class) +// ============================================================================ + +typedef struct { + void* head; // BASE pointer to first block (or NULL) + uint16_t count; // Number of blocks in this tcache +} TinyTcacheBin; + +// Thread-local storage: 8 classes (C0-C7) +static __thread TinyTcacheBin g_tiny_tcache_bins[TINY_NUM_CLASSES]; + +// ============================================================================ +// Push (try to add block to tcache) +// ============================================================================ +// +// Arguments: +// class_idx - Tiny class index (0-7) +// base - BASE pointer to freed block +// +// Returns: +// true - Block accepted into tcache +// false - Tcache full (overflow), caller should use unified_cache +// +// Side effects: +// - Writes intrusive next pointer into block (via tiny_next_store) +// - Updates head and count +// + +static inline bool tiny_tcache_try_push(int class_idx, void* base) { + // ENV gate check (cached, should be fast) + if (!tiny_tcache_enabled()) { + return false; // Tcache disabled, fall through to unified_cache + } + + TinyTcacheBin* bin = &g_tiny_tcache_bins[class_idx]; + uint16_t cap = tiny_tcache_cap(); + + // Check capacity + if (bin->count >= cap) { + return false; // Overflow, delegate to unified_cache + } + + // Debug: validate base pointer +#if !HAKMEM_BUILD_RELEASE + if (base == NULL || (uintptr_t)base < 4096) { + fprintf(stderr, "[TINY_TCACHE] BUG: invalid base=%p in try_push (class=%d)\n", base, class_idx); + abort(); + } +#endif + + // LIFO push: link block to current head + tiny_next_store(base, class_idx, bin->head); + bin->head = base; + bin->count++; + + // Debug: check invariant +#if !HAKMEM_BUILD_RELEASE + assert(bin->count <= cap); +#endif + + return true; // Block accepted +} + +// ============================================================================ +// Pop (try to get block from tcache) +// ============================================================================ +// +// Arguments: +// class_idx - Tiny class index (0-7) +// +// Returns: +// BASE pointer - Block from tcache (LIFO order) +// NULL - Tcache empty, caller should use unified_cache +// +// Side effects: +// - Reads intrusive next pointer from block (via tiny_next_load) +// - Updates head and count +// + +static inline void* tiny_tcache_try_pop(int class_idx) { + // ENV gate check (cached, should be fast) + if (!tiny_tcache_enabled()) { + return NULL; // Tcache disabled, fall through to unified_cache + } + + TinyTcacheBin* bin = &g_tiny_tcache_bins[class_idx]; + + // Check if empty + if (bin->head == NULL) { + return NULL; // Miss, delegate to unified_cache + } + + // LIFO pop: unlink head + void* base = bin->head; + void* next = tiny_next_load(base, class_idx); + bin->head = next; + bin->count--; + + // Debug: validate popped pointer +#if !HAKMEM_BUILD_RELEASE + if (base == NULL || (uintptr_t)base < 4096) { + fprintf(stderr, "[TINY_TCACHE] BUG: invalid base=%p in try_pop (class=%d)\n", base, class_idx); + abort(); + } +#endif + + return base; // Hit (BASE pointer) +} + +// ============================================================================ +// Stats (optional, for diagnostics) +// ============================================================================ + +// Get current count for a class (debug/stats only) +static inline uint16_t tiny_tcache_count(int class_idx) { + return g_tiny_tcache_bins[class_idx].count; +} + +#endif // TINY_TCACHE_BOX_H diff --git a/core/box/tiny_tcache_env_box.c b/core/box/tiny_tcache_env_box.c new file mode 100644 index 00000000..e20f1f7e --- /dev/null +++ b/core/box/tiny_tcache_env_box.c @@ -0,0 +1,68 @@ +// ============================================================================ +// Phase 14 v1: Tiny tcache ENV Box (L0) - Implementation +// ============================================================================ + +#include "tiny_tcache_env_box.h" +#include +#include +#include +#include + +// ============================================================================ +// Global State +// ============================================================================ + +_Atomic int g_tiny_tcache_enabled = -1; +_Atomic uint16_t g_tiny_tcache_cap = 0; + +// ============================================================================ +// Init (Cold Path) +// ============================================================================ + +int tiny_tcache_env_init(void) { + const char* env_enabled = getenv("HAKMEM_TINY_TCACHE"); + const char* env_cap = getenv("HAKMEM_TINY_TCACHE_CAP"); + + int enabled = 0; // default: OFF (opt-in) + uint16_t cap = 64; // default: 64 (glibc tcache-like) + + // Parse HAKMEM_TINY_TCACHE + if (env_enabled && (env_enabled[0] == '1' || strcmp(env_enabled, "true") == 0 || strcmp(env_enabled, "TRUE") == 0)) { + enabled = 1; + } + + // Parse HAKMEM_TINY_TCACHE_CAP + if (env_cap && *env_cap) { + int parsed = atoi(env_cap); + if (parsed > 0 && parsed <= 65535) { + cap = (uint16_t)parsed; + } + } + + // Cache results + atomic_store_explicit(&g_tiny_tcache_enabled, enabled, memory_order_relaxed); + atomic_store_explicit(&g_tiny_tcache_cap, cap, memory_order_relaxed); + + // Log once (stderr for immediate visibility) + if (enabled) { + char msg[128]; + int n = snprintf(msg, sizeof(msg), "[TINY_TCACHE] enabled (cap=%u)\n", (unsigned)cap); + if (n > 0 && n < (int)sizeof(msg)) { + ssize_t w = write(2, msg, (size_t)n); + (void)w; + } + } + + return enabled; +} + +// ============================================================================ +// Refresh (Cold Path, called from bench_profile) +// ============================================================================ + +void tiny_tcache_env_refresh_from_env(void) { + // Reset to uninitialized state (-1 / 0) + // Next call to tiny_tcache_enabled() / tiny_tcache_cap() will re-read ENV + atomic_store_explicit(&g_tiny_tcache_enabled, -1, memory_order_relaxed); + atomic_store_explicit(&g_tiny_tcache_cap, 0, memory_order_relaxed); +} diff --git a/core/box/tiny_tcache_env_box.h b/core/box/tiny_tcache_env_box.h new file mode 100644 index 00000000..b60d2254 --- /dev/null +++ b/core/box/tiny_tcache_env_box.h @@ -0,0 +1,93 @@ +// ============================================================================ +// Phase 14 v1: Tiny tcache ENV Box (L0) +// ============================================================================ +// +// Purpose: ENV gate for tcache-style intrusive LIFO cache +// +// Design: docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_DESIGN.md +// Instructions: docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_NEXT_INSTRUCTIONS.md +// +// Strategy: +// - Add intrusive LIFO tcache layer before array-based UnifiedCache +// - Reduce pointer-chase overhead (system malloc tcache pattern) +// - Hit: head pointer + intrusive next only (no array access) +// +// ENV: +// HAKMEM_TINY_TCACHE=0/1 (default: 0, opt-in) +// HAKMEM_TINY_TCACHE_CAP=64 (default: 64, per-class capacity) +// +// API: +// tiny_tcache_enabled() -> int +// tiny_tcache_cap() -> uint16_t +// tiny_tcache_env_refresh_from_env() +// +// Box Theory: +// - L0: This file (ENV gate, reversible) +// - L1: tiny_tcache_box.h (intrusive LIFO logic) +// - L2: tiny_unified_cache.h (integration point) +// +// Safety: +// - ENV-gated (default OFF, opt-in) +// - Reversible (ENV toggle) +// - No call site changes (integration inside unified_cache) +// +// ============================================================================ + +#ifndef TINY_TCACHE_ENV_BOX_H +#define TINY_TCACHE_ENV_BOX_H + +#include +#include + +// ============================================================================ +// Global State (L0) +// ============================================================================ + +// Cached state: -1 (uninitialized), 0 (disabled), 1 (enabled) +extern _Atomic int g_tiny_tcache_enabled; + +// Cached capacity: 0 (uninitialized), >0 (cap value) +extern _Atomic uint16_t g_tiny_tcache_cap; + +// ============================================================================ +// Hot Inline API (L0) +// ============================================================================ + +// Check if tcache is enabled +// Returns: 1 if enabled, 0 if disabled +static inline int tiny_tcache_enabled(void) { + int val = atomic_load_explicit(&g_tiny_tcache_enabled, memory_order_relaxed); + + if (__builtin_expect(val == -1, 0)) { + // Lazy init: read ENV once + extern int tiny_tcache_env_init(void); + val = tiny_tcache_env_init(); + } + + return val; +} + +// Get tcache capacity per class +// Returns: capacity (default 64) +static inline uint16_t tiny_tcache_cap(void) { + uint16_t cap = atomic_load_explicit(&g_tiny_tcache_cap, memory_order_relaxed); + + if (__builtin_expect(cap == 0, 0)) { + // Lazy init: read ENV once + extern int tiny_tcache_env_init(void); + tiny_tcache_env_init(); + cap = atomic_load_explicit(&g_tiny_tcache_cap, memory_order_relaxed); + } + + return cap; +} + +// ============================================================================ +// Cold API (L2) +// ============================================================================ + +// Refresh ENV cache (called from bench_profile after putenv) +// Pattern: Same as Phase 8/13 (FREE_STATIC_ROUTE, C7_PRESERVE_HEADER) +extern void tiny_tcache_env_refresh_from_env(void); + +#endif // TINY_TCACHE_ENV_BOX_H diff --git a/core/front/tiny_unified_cache.h b/core/front/tiny_unified_cache.h index 54994e36..098cc58f 100644 --- a/core/front/tiny_unified_cache.h +++ b/core/front/tiny_unified_cache.h @@ -30,6 +30,7 @@ #include "../hakmem_tiny_config.h" // For TINY_NUM_CLASSES #include "../box/ptr_type_box.h" // Phantom pointer types (BASE/USER) #include "../box/tiny_front_config_box.h" // Phase 8-Step1: Config macros +#include "../box/tiny_tcache_box.h" // Phase 14 v1: Intrusive LIFO tcache // ============================================================================ // Phase 3 C2 Patch 3: Bounds Check Compile-out @@ -220,9 +221,16 @@ static inline int unified_cache_push(int class_idx, hak_base_ptr_t base) { // Fast path: Unified cache disabled → return 0 (not handled) if (__builtin_expect(!TINY_FRONT_UNIFIED_CACHE_ENABLED, 0)) return 0; - TinyUnifiedCache* cache = &g_unified_cache[class_idx]; // 1 cache miss (TLS) void* base_raw = HAK_BASE_TO_RAW(base); + // Phase 14 v1: Try tcache first (intrusive LIFO, no array access) + if (tiny_tcache_try_push(class_idx, base_raw)) { + return 1; // SUCCESS (tcache hit, no array access) + } + + // Tcache overflow or disabled → fall through to array cache + TinyUnifiedCache* cache = &g_unified_cache[class_idx]; // 1 cache miss (TLS) + // Phase 8-Step3: Lazy init check (conditional in PGO mode) // PGO builds assume bench_fast_init() prewarmed cache → remove check (-1 branch) #if !HAKMEM_TINY_FRONT_PGO @@ -281,7 +289,23 @@ static inline hak_base_ptr_t unified_cache_pop_or_refill(int class_idx) { } #endif - // Try pop from cache (fast path) + // Phase 14 v1: Try tcache first (intrusive LIFO, no array access) + void* tcache_base = tiny_tcache_try_pop(class_idx); + if (tcache_base != NULL) { +#if !HAKMEM_BUILD_RELEASE + g_unified_cache_hit[class_idx]++; +#endif + // Performance measurement: count cache hits (ENV enabled only) + if (__builtin_expect(unified_cache_measure_check(), 0)) { + atomic_fetch_add_explicit(&g_unified_cache_hits_global, + 1, memory_order_relaxed); + atomic_fetch_add_explicit(&g_unified_cache_hits_by_class[class_idx], + 1, memory_order_relaxed); + } + return HAK_BASE_FROM_RAW(tcache_base); // HIT (tcache, no array access) + } + + // Tcache miss or disabled → try pop from array cache (fast path) if (__builtin_expect(cache->head != cache->tail, 1)) { void* base = cache->slots[cache->head]; // 1 cache miss (array access) cache->head = (cache->head + 1) & cache->mask; diff --git a/docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_AB_TEST_RESULTS.md b/docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_AB_TEST_RESULTS.md new file mode 100644 index 00000000..7f737027 --- /dev/null +++ b/docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_AB_TEST_RESULTS.md @@ -0,0 +1,184 @@ +# Phase 14 v1: Pointer-Chase Reduction (tcache-style) A/B Test Results + +**Date:** 2025-12-15 +**Benchmark:** Mixed (16–1024B) 10-run cleanenv +**Target:** Reduce pointer-chase overhead with intrusive LIFO tcache layer +**Expected ROI:** +15-25% (design estimate) +**GO Threshold:** +1.0% mean improvement + +--- + +## 1. Implementation Summary + +Phase 14 v1 adds an intrusive LIFO tcache layer (L1) before the existing array-based UnifiedCache, inspired by glibc tcache pattern. + +**Key Components:** +- `core/box/tiny_tcache_env_box.{h,c}` - L0 ENV gate (HAKMEM_TINY_TCACHE=0/1, default 0) +- `core/box/tiny_tcache_box.h` - L1 intrusive LIFO cache (TLS per-class bins) +- `core/front/tiny_unified_cache.h` - Integration (try tcache first, fall through to array cache) +- `core/bench_profile.h` - Refresh sync for bench_profile + +**Important Note (wiring completeness):** +- v1 の tcache pop は `unified_cache_pop_or_refill()` 側にあるが、現行の main alloc hot path(`tiny_hot_alloc_fast()`)は `unified_cache_pop_or_refill()` を経由しない。 +- 一方で free 側は `unified_cache_push()` 経由で tcache に入るため、`HAKMEM_TINY_TCACHE=1` のとき **tcache が “sink” になり、alloc/pop 側の ROI が測れない**可能性がある。 +- 後続の修正(Phase 14 v2)で `tiny_front_hot_box` に pop/push を接続し、再 A/B を推奨する: + - `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_2_NEXT_INSTRUCTIONS.md` + +**Design:** +- TLS state: 8 TinyTcacheBin structs (head pointer + count per class) +- Intrusive next pointers stored in blocks (via tiny_next_store/load SSOT) +- Capacity: 64 blocks per class (default, configurable via HAKMEM_TINY_TCACHE_CAP) +- LIFO order: better cache locality than FIFO array cache +- Two-layer fallback: tcache (fast) → unified_cache (overflow/miss) + +**ENV Control:** +```bash +export HAKMEM_TINY_TCACHE=0 # Baseline (tcache disabled) +export HAKMEM_TINY_TCACHE=1 # Optimized (tcache enabled) +export HAKMEM_TINY_TCACHE_CAP=64 # Capacity per class (default: 64) +``` + +--- + +## 2. A/B Test Results (Mixed 10-run) + +### Baseline (TCACHE=0): +``` +Run 1: 51,264,525 ops/s +Run 2: 50,950,925 ops/s +Run 3: 51,500,295 ops/s +Run 4: 51,698,050 ops/s +Run 5: 50,396,686 ops/s +Run 6: 50,960,807 ops/s +Run 7: 50,616,179 ops/s +Run 8: 51,817,424 ops/s +Run 9: 50,762,958 ops/s +Run 10: 50,865,941 ops/s +``` +**Mean:** 51,083,379 ops/s +**Median:** 50,955,866 ops/s + +### Optimized (TCACHE=1): +``` +Run 1: 51,555,414 ops/s +Run 2: 51,389,988 ops/s +Run 3: 50,795,917 ops/s +Run 4: 51,880,520 ops/s +Run 5: 50,574,457 ops/s +Run 6: 50,627,901 ops/s +Run 7: 51,233,081 ops/s +Run 8: 51,278,890 ops/s +Run 9: 50,761,326 ops/s +Run 10: 51,770,890 ops/s +``` +**Mean:** 51,186,838 ops/s +**Median:** 51,255,986 ops/s + +### Delta: +- **Mean delta:** +0.20% (103,459 ops/s improvement) +- **Median delta:** +0.59% (300,120 ops/s improvement) + +--- + +## 3. Verdict: NEUTRAL + +**Result:** +0.20% mean improvement (below +1.0% GO threshold) + +**Analysis:** +- Phase 14 v1 shows minimal performance impact on Mixed workload +- Median delta (+0.59%) is slightly better than mean, suggesting some stability improvement +- Both deltas are below the +1.0% GO threshold → NEUTRAL classification +- Expected ROI (+15-25%) was not achieved + +**Possible Reasons for Lower-than-Expected ROI:** +1. **Workload Mismatch:** Mixed workload (16–1024B) spans multiple classes (C0-C7), but tcache benefits may be concentrated in hot classes (C2/C3: 128B/256B). Mid/large classes (C5-C7) may not benefit as much. +2. **Cache Locality vs Array Access:** While tcache reduces pointer-chasing, the existing UnifiedCache array access may already be well-cached in L1/L2, limiting improvement. +3. **Cap Too Small:** Default cap=64 may be too small for high-churn workloads, causing frequent overflow to array cache. +4. **Intrusive Next Overhead:** Writing/reading next pointers may add overhead that offsets the pointer-chase reduction. +5. **Incomplete hot-path coverage (v1):** Free 側だけ tcache に入って alloc 側が消費しないため、hit が “見えない” 可能性がある(Phase 14 v2 で通電確認が必要)。 + +**Comparison to Smoke Test:** +- Smoke test (single run): +2.4% (51.88M vs 50.68M ops/s) +- Formal 10-run: +0.20% mean, +0.59% median +- Variance across runs suggests smoke test was an outlier + +--- + +## 4. Recommendation: Freeze as Research Box + +**Decision:** Freeze Phase 14 v1 as research box (default OFF) + +**Rationale:** +- NEUTRAL result (+0.20%) does not justify promotion to default +- No measurable harm (close to baseline), suitable for research/experimentation +- Future work may explore: + - Per-class cap tuning (hot classes get larger caps) + - Workload-specific profiling (C2/C3-heavy vs C5-C7-heavy) + - Alternative intrusive next pointer strategies + +**Next Steps:** +1. Commit Phase 14 v1 implementation with NEUTRAL verdict +2. Update CURRENT_TASK.md to freeze as research box +3. Keep ENV gate (HAKMEM_TINY_TCACHE=0 default) for future experimentation +4. Consider alternative approaches for pointer-chase reduction (e.g., deeper pipeline optimization, better prefetching) + +--- + +## 5. Raw Data + +### Baseline (TCACHE=0): +``` +51264525 +50950925 +51500295 +51698050 +50396686 +50960807 +50616179 +51817424 +50762958 +50865941 +``` +Mean: 51,083,379 ops/s +Median: 50,955,866 ops/s + +### Optimized (TCACHE=1): +``` +51555414 +51389988 +50795917 +51880520 +50574457 +50627901 +51233081 +51278890 +50761326 +51770890 +``` +Mean: 51,186,838 ops/s +Median: 51,255,986 ops/s + +--- + +## 6. Files Modified + +### Created: +- `core/box/tiny_tcache_env_box.h` - L0 ENV gate +- `core/box/tiny_tcache_env_box.c` - ENV init/refresh implementation +- `core/box/tiny_tcache_box.h` - L1 intrusive LIFO cache + +### Modified: +- `core/front/tiny_unified_cache.h` - Integration (try tcache first) +- `core/bench_profile.h` - Refresh sync +- `Makefile` - Build system integration +- `scripts/run_mixed_10_cleanenv.sh` - ENV leak prevention (already updated) + +--- + +## 7. Conclusion + +Phase 14 v1 (Pointer-Chase Reduction via tcache-style intrusive LIFO) achieved **+0.20% mean improvement** on Mixed 10-run benchmark, which is **NEUTRAL** (below +1.0% GO threshold). + +**Final Status:** Freeze as research box (HAKMEM_TINY_TCACHE=0 default, OFF) + +**Future Work:** Consider per-class cap tuning or alternative pointer-chase reduction strategies. diff --git a/docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_DESIGN.md b/docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_DESIGN.md index caf63702..574803ad 100644 --- a/docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_DESIGN.md +++ b/docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_DESIGN.md @@ -147,3 +147,12 @@ GO/NO-GO: - tcache hit 率が高い場合、配列アクセス・FIFO の古い再利用を回避できる - “system malloc が速い” の差分(tcache 的挙動)に寄せる最短の一手 +--- + +## Update(2025-12-15) + +v1 の統合点(`core/front/tiny_unified_cache.h`)だけでは、現行の main alloc hot path(`tiny_hot_alloc_fast()`)が tcache を消費しないため、 +`HAKMEM_TINY_TCACHE=1` のとき tcache が “sink” になりやすい。 + +次は hot path(`core/box/tiny_front_hot_box.h`)へ pop/push を接続して、通電した状態で再 A/B を取る(Phase 14 v2): +- `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_2_NEXT_INSTRUCTIONS.md` diff --git a/docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_NEXT_INSTRUCTIONS.md b/docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_NEXT_INSTRUCTIONS.md index 4e7dac60..9320329e 100644 --- a/docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_NEXT_INSTRUCTIONS.md +++ b/docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_NEXT_INSTRUCTIONS.md @@ -109,3 +109,12 @@ GO のとき: NO-GO/NEUTRAL のとき: - research box freeze(default OFF のまま保持) +--- + +## Update(2025-12-15) + +v1 の統合点(`core/front/tiny_unified_cache.h`)だけだと、現行の main alloc hot path(`tiny_hot_alloc_fast()`)が tcache を消費しないため、 +`HAKMEM_TINY_TCACHE=1` で “sink” になりやすい。 + +次は hot path(`core/box/tiny_front_hot_box.h`)へ pop/push を接続し、通電した状態で再 A/B を取る(Phase 14 v2): +- `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_2_NEXT_INSTRUCTIONS.md` diff --git a/docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_2_NEXT_INSTRUCTIONS.md b/docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_2_NEXT_INSTRUCTIONS.md new file mode 100644 index 00000000..13fb4370 --- /dev/null +++ b/docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_2_NEXT_INSTRUCTIONS.md @@ -0,0 +1,131 @@ +# Phase 14 v2: Pointer-Chase Reduction — Hot Path Integration Next Instructions(Tiny tcache intrusive LIFO) + +## Status + +- Phase 14 v1(tcache L1 追加)は Mixed 10-run で **NEUTRAL**(+0.20% mean / +0.59% median) + - 結果: `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_AB_TEST_RESULTS.md` + - 実装: `core/box/tiny_tcache_env_box.{h,c}` / `core/box/tiny_tcache_box.h` / `core/front/tiny_unified_cache.h` +- ただし現状の v1 は **free 側(`unified_cache_push()`)だけ tcache に入れて、alloc 側(`tiny_hot_alloc_fast()`)が tcache を消費しない**ため、 + - tcache が「実質 sink」になり、ROI が正しく測れない + - “tcache-style” の前提(push/pop の対称)が崩れている + +Phase 14 v2 は **tiny front の実ホットパス**に tcache を接続して、正しい A/B を取り直す。 + +--- + +## 0. 目的(GO 条件) + +Mixed 10-run(clean env)で: +- **GO**: mean +1.0% 以上 +- **NO-GO**: mean -1.0% 以下(即 rollback / freeze) +- **NEUTRAL**: ±1.0%(research box freeze) + +追加ゲート(必須): +- `HAKMEM_TINY_TCACHE=1` のとき **tcache pop が実際に発生**している(0 なら設計未通電) + +--- + +## 1. Box 図(境界 1 箇所) + +``` +L0: tiny_tcache_env_box (ENV gate / refresh / rollback) + ↓ +L1: tiny_tcache_box (intrusive LIFO: push/pop, cap) + ↓ +L2: tiny_front_hot_box (hot alloc/free: tcache → unified_cache(FIFO)) + ↓ +L3: cold/refill (unified_cache_refill → SuperSlab) +``` + +境界は **“tcache miss/overflow → 既存 UnifiedCache”** の 1 箇所に固定する。 + +--- + +## 2. 実装パッチ順(小さく積む) + +### Patch 1: Hot alloc に tcache pop を接続(必須) + +対象: +- `core/box/tiny_front_hot_box.h` + +変更: +- `tiny_hot_alloc_fast(int class_idx)` の先頭で + - `tiny_tcache_try_pop(class_idx)` を試す + - HIT なら `tiny_header_finalize_alloc(base, class_idx)` で即 return + - MISS なら既存の FIFO(`cache->slots[head]`)へフォールバック + +要件: +- tcache OFF(default)ではホット経路が肥大しないよう最小差分にする +- “確信がないなら fallback” を厳守(Fail-Fast) + +### Patch 2: Hot free に tcache push を接続(推奨) + +対象: +- `core/box/tiny_front_hot_box.h` + +変更: +- `tiny_hot_free_fast(int class_idx, void* base)` の先頭で + - `tiny_tcache_try_push(class_idx, base)` を試す + - SUCCESS なら `return 1` + - overflow / disabled のときだけ既存 FIFO へ + +狙い: +- `unified_cache_push()` 経由以外の “直 push” 経路でも tcache が効く状態にする + +### Patch 3: 可視化(最小・TLS) + +対象候補: +- `core/box/tiny_tcache_box.h`(TLS カウンタ) + +追加(debug / research 用): +- `tcache_pop_hit/miss` +- `tcache_push_hit/overflow` +- “ワンショット dump” を 1 回だけ(ENV opt-in)で出せるようにする + +禁止: +- hot path に atomic 統計を置かない(Phase 12 / POOL-DN-BATCH の教訓) + +--- + +## 3. A/B テスト(同一バイナリ) + +Baseline: +```sh +HAKMEM_TINY_TCACHE=0 scripts/run_mixed_10_cleanenv.sh +``` + +Optimized: +```sh +HAKMEM_TINY_TCACHE=1 scripts/run_mixed_10_cleanenv.sh +``` + +追加(効果が class 依存か確認): +```sh +HAKMEM_BENCH_C7_ONLY=1 HAKMEM_TINY_TCACHE=0 scripts/run_mixed_10_cleanenv.sh +HAKMEM_BENCH_C7_ONLY=1 HAKMEM_TINY_TCACHE=1 scripts/run_mixed_10_cleanenv.sh +``` + +cap 探索(research、必要なときだけ): +```sh +HAKMEM_TINY_TCACHE=1 HAKMEM_TINY_TCACHE_CAP=32 scripts/run_mixed_10_cleanenv.sh +HAKMEM_TINY_TCACHE=1 HAKMEM_TINY_TCACHE_CAP=64 scripts/run_mixed_10_cleanenv.sh +HAKMEM_TINY_TCACHE=1 HAKMEM_TINY_TCACHE_CAP=128 scripts/run_mixed_10_cleanenv.sh +``` + +--- + +## 4. 健康診断(必須) + +```sh +scripts/verify_health_profiles.sh +``` + +--- + +## 5. 判定と扱い + +- GO: `bench_profile` への昇格は **MIXED_TINYV3_C7_SAFE のみ**から開始(段階的) +- NEUTRAL/NO-GO: Phase 14 v2 は research box として freeze(default OFF のまま) +- Rollback: + - `export HAKMEM_TINY_TCACHE=0` +