Phase 14 v1: Pointer-Chase Reduction (tcache) NEUTRAL (+0.20%)

Implementation:
- Intrusive LIFO tcache layer (L1) before UnifiedCache
- TLS per-class bins (head pointer + count)
- Intrusive next pointers (via tiny_next_store/load SSOT)
- Cap: 64 blocks per class (default)
- ENV: HAKMEM_TINY_TCACHE=0/1 (default: 0, OFF)

A/B Test Results (Mixed 10-run):
- Baseline (TCACHE=0): 51,083,379 ops/s
- Optimized (TCACHE=1): 51,186,838 ops/s
- Mean delta: +0.20% (below +1.0% GO threshold)
- Median delta: +0.59%

Verdict: NEUTRAL - Freeze as research box (default OFF)

Root Cause (v1 wiring incomplete):
- Free side pushes to tcache via unified_cache_push()
- Alloc hot path (tiny_hot_alloc_fast) doesn't consume tcache
- tcache becomes "sink" without alloc-side pop → ROI not measurable

Files:
- Created: core/box/tiny_tcache_{env_box,box}.h, tiny_tcache_env_box.c
- Modified: core/front/tiny_unified_cache.h (integration)
- Modified: core/bench_profile.h (refresh sync)
- Modified: Makefile (build integration)
- Results: docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_AB_TEST_RESULTS.md
- v2 Instructions: docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_2_NEXT_INSTRUCTIONS.md

Next: Phase 14 v2 (connect tcache to tiny_front_hot_box alloc/free hot path)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-12-15 01:28:50 +09:00
parent 0b306f72f4
commit f8fb05bc13
11 changed files with 729 additions and 9 deletions

View File

@ -224,12 +224,49 @@ Phase 6-10 で達成した累積改善:
**Next**: Phase 12 Strategic Pause の次の gap 仮説へ進む
### Next: Phase 14Pointer Chase Reduction / Tiny tcache
### Phase 14 v1: Pointer Chase Reduction (tcache-style) — NEUTRAL (+0.20%) ⚠️ RESEARCH BOX
**狙い**: system malloc の tcache に寄せて、Tiny frontend の “配列/FIFO/indirection” コストを減らす。
**Date**: 2025-12-15
**Verdict**: **NEUTRAL (+0.20%)** — Frozen as research box (default OFF, manual opt-in)
- 設計: `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_DESIGN.md`
- 指示: `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_NEXT_INSTRUCTIONS.md`
**Target**: Reduce pointer-chase overhead with intrusive LIFO tcache layer (inspired by glibc tcache)
**Strategy (v1)**:
- Add intrusive LIFO tcache layer (L1) before existing array-based UnifiedCache
- TLS per-class bins (head pointer + count)
- Intrusive next pointers stored in blocks (via tiny_next_store/load SSOT)
- Cap: 64 blocks per class (default, configurable)
- ENV: `HAKMEM_TINY_TCACHE=0/1` (default: 0, OFF)
**Results (Mixed 10-run)**:
| Case | TCACHE | Mean (ops/s) | Median (ops/s) | Delta |
|------|--------|--------------|----------------|-------|
| A (baseline) | 0 | 51,083,379 | 50,955,866 | — |
| B (optimized) | 1 | 51,186,838 | 51,255,986 | **+0.20%** (mean) / **+0.59%** (median) |
**Key Findings**:
1. **Mean delta: +0.20%** (below +1.0% GO threshold → NEUTRAL)
2. **Median delta: +0.59%** (slightly better stability, but still NEUTRAL)
3. **Expected ROI (+15-25%) not achieved** on Mixed workload
4. ⚠️ **v1 の統合点が “free 側中心” で、alloc ホットパス(`tiny_hot_alloc_fast()`)が tcache を消費しない**
- 現状: `unified_cache_push()` は tcache に入るが、alloc 側は FIFO`g_unified_cache[].slots`)のみ → tcache が実質 sink になりやすい
- v1 の A/B は ROI を過小評価する可能性が高いPhase 14 v2 で通電確認が必要)
**Possible Reasons for Lower ROI**:
- **Workload mismatch**: Mixed (161024B) spans C0-C7, but tcache benefits may be concentrated in hot classes (C2/C3)
- **Existing cache efficiency**: UnifiedCache array access may already be well-cached in L1/L2
- **Cap too small**: Default cap=64 may cause frequent overflow to array cache
- **Intrusive next overhead**: Writing/reading next pointers may offset pointer-chase reduction
**Action**:
- ✅ Freeze Phase 14 v1 as research box (default OFF)
- ENV: `HAKMEM_TINY_TCACHE=0/1` (default: 0), `HAKMEM_TINY_TCACHE_CAP=64`
- 📋 Results: `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_AB_TEST_RESULTS.md`
- 📋 Design: `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_DESIGN.md`
- 📋 Instructions: `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_NEXT_INSTRUCTIONS.md`
- 📋 Next (Phase 14 v2): `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_2_NEXT_INSTRUCTIONS.md`alloc/pop 統合)
**Future Work**: Consider per-class cap tuning or alternative pointer-chase reduction strategies
## 更新メモ2025-12-14 Phase 5 E5-3 Analysis - Strategic Pivot

View File

@ -218,12 +218,12 @@ LDFLAGS += $(EXTRA_LDFLAGS)
# Targets
TARGET = test_hakmem
OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o
OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/box/tiny_tcache_env_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o
OBJS = $(OBJS_BASE)
# Shared library
SHARED_LIB = libhakmem.so
SHARED_OBJS = hakmem_shared.o hakmem_config_shared.o hakmem_tiny_config_shared.o hakmem_ucb1_shared.o hakmem_bigcache_shared.o hakmem_pool_shared.o hakmem_l25_pool_shared.o hakmem_site_rules_shared.o hakmem_tiny_shared.o core/box/ss_allocation_box_shared.o superslab_stats_shared.o superslab_cache_shared.o superslab_ace_shared.o superslab_slab_shared.o superslab_backend_shared.o core/superslab_head_stub_shared.o hakmem_smallmid_shared.o core/box/superslab_expansion_box_shared.o core/box/integrity_box_shared.o core/box/mailbox_box_shared.o core/box/front_gate_box_shared.o core/box/front_gate_classifier_shared.o core/box/free_publish_box_shared.o core/box/capacity_box_shared.o core/box/carve_push_box_shared.o core/box/prewarm_box_shared.o core/box/ss_hot_prewarm_box_shared.o core/box/front_metrics_box_shared.o core/box/bench_fast_box_shared.o core/box/ss_addr_map_box_shared.o core/box/ss_pt_impl_shared.o core/box/slab_recycling_box_shared.o core/box/pagefault_telemetry_box_shared.o core/box/tiny_sizeclass_hist_box_shared.o core/box/tiny_env_box_shared.o core/box/tiny_route_box_shared.o core/box/free_front_v3_env_box_shared.o core/box/free_path_stats_box_shared.o core/box/free_dispatch_stats_box_shared.o core/box/alloc_gate_stats_box_shared.o core/box/tiny_page_box_shared.o core/box/tiny_class_policy_box_shared.o core/box/tiny_class_stats_box_shared.o core/box/tiny_policy_learner_box_shared.o core/box/ss_budget_box_shared.o core/box/tiny_mem_stats_box_shared.o core/box/wrapper_env_box_shared.o core/box/free_wrapper_env_snapshot_box_shared.o core/box/malloc_wrapper_env_snapshot_box_shared.o core/box/madvise_guard_box_shared.o core/box/libm_reloc_guard_box_shared.o core/box/hakmem_env_snapshot_box_shared.o core/box/tiny_c7_preserve_header_env_box_shared.o core/page_arena_shared.o core/front/tiny_unified_cache_shared.o core/tiny_alloc_fast_push_shared.o core/tiny_c7_ultra_segment_shared.o core/tiny_c7_ultra_shared.o core/link_stubs_shared.o core/tiny_failfast_shared.o tiny_sticky_shared.o tiny_remote_shared.o tiny_publish_shared.o tiny_debug_ring_shared.o hakmem_tiny_magazine_shared.o hakmem_tiny_stats_shared.o hakmem_tiny_sfc_shared.o hakmem_tiny_query_shared.o hakmem_tiny_rss_shared.o hakmem_tiny_registry_shared.o hakmem_tiny_remote_target_shared.o hakmem_tiny_bg_spill_shared.o tiny_adaptive_sizing_shared.o hakmem_super_registry_shared.o hakmem_shared_pool_shared.o hakmem_shared_pool_acquire_shared.o hakmem_shared_pool_release_shared.o hakmem_elo_shared.o hakmem_batch_shared.o hakmem_p2_shared.o hakmem_sizeclass_dist_shared.o hakmem_evo_shared.o hakmem_debug_shared.o hakmem_sys_shared.o hakmem_whale_shared.o hakmem_policy_shared.o hakmem_ace_shared.o hakmem_ace_stats_shared.o hakmem_ace_controller_shared.o hakmem_ace_metrics_shared.o hakmem_ace_ucb1_shared.o hakmem_prof_shared.o hakmem_learner_shared.o hakmem_size_hist_shared.o hakmem_learn_log_shared.o hakmem_syscall_shared.o tiny_fastcache_shared.o core/box/super_reg_box_shared.o core/box/shared_pool_box_shared.o core/box/remote_side_box_shared.o core/tiny_destructors_shared.o
SHARED_OBJS = hakmem_shared.o hakmem_config_shared.o hakmem_tiny_config_shared.o hakmem_ucb1_shared.o hakmem_bigcache_shared.o hakmem_pool_shared.o hakmem_l25_pool_shared.o hakmem_site_rules_shared.o hakmem_tiny_shared.o core/box/ss_allocation_box_shared.o superslab_stats_shared.o superslab_cache_shared.o superslab_ace_shared.o superslab_slab_shared.o superslab_backend_shared.o core/superslab_head_stub_shared.o hakmem_smallmid_shared.o core/box/superslab_expansion_box_shared.o core/box/integrity_box_shared.o core/box/mailbox_box_shared.o core/box/front_gate_box_shared.o core/box/front_gate_classifier_shared.o core/box/free_publish_box_shared.o core/box/capacity_box_shared.o core/box/carve_push_box_shared.o core/box/prewarm_box_shared.o core/box/ss_hot_prewarm_box_shared.o core/box/front_metrics_box_shared.o core/box/bench_fast_box_shared.o core/box/ss_addr_map_box_shared.o core/box/ss_pt_impl_shared.o core/box/slab_recycling_box_shared.o core/box/pagefault_telemetry_box_shared.o core/box/tiny_sizeclass_hist_box_shared.o core/box/tiny_env_box_shared.o core/box/tiny_route_box_shared.o core/box/free_front_v3_env_box_shared.o core/box/free_path_stats_box_shared.o core/box/free_dispatch_stats_box_shared.o core/box/alloc_gate_stats_box_shared.o core/box/tiny_page_box_shared.o core/box/tiny_class_policy_box_shared.o core/box/tiny_class_stats_box_shared.o core/box/tiny_policy_learner_box_shared.o core/box/ss_budget_box_shared.o core/box/tiny_mem_stats_box_shared.o core/box/wrapper_env_box_shared.o core/box/free_wrapper_env_snapshot_box_shared.o core/box/malloc_wrapper_env_snapshot_box_shared.o core/box/madvise_guard_box_shared.o core/box/libm_reloc_guard_box_shared.o core/box/hakmem_env_snapshot_box_shared.o core/box/tiny_c7_preserve_header_env_box_shared.o core/box/tiny_tcache_env_box_shared.o core/page_arena_shared.o core/front/tiny_unified_cache_shared.o core/tiny_alloc_fast_push_shared.o core/tiny_c7_ultra_segment_shared.o core/tiny_c7_ultra_shared.o core/link_stubs_shared.o core/tiny_failfast_shared.o tiny_sticky_shared.o tiny_remote_shared.o tiny_publish_shared.o tiny_debug_ring_shared.o hakmem_tiny_magazine_shared.o hakmem_tiny_stats_shared.o hakmem_tiny_sfc_shared.o hakmem_tiny_query_shared.o hakmem_tiny_rss_shared.o hakmem_tiny_registry_shared.o hakmem_tiny_remote_target_shared.o hakmem_tiny_bg_spill_shared.o tiny_adaptive_sizing_shared.o hakmem_super_registry_shared.o hakmem_shared_pool_shared.o hakmem_shared_pool_acquire_shared.o hakmem_shared_pool_release_shared.o hakmem_elo_shared.o hakmem_batch_shared.o hakmem_p2_shared.o hakmem_sizeclass_dist_shared.o hakmem_evo_shared.o hakmem_debug_shared.o hakmem_sys_shared.o hakmem_whale_shared.o hakmem_policy_shared.o hakmem_ace_shared.o hakmem_ace_stats_shared.o hakmem_ace_controller_shared.o hakmem_ace_metrics_shared.o hakmem_ace_ucb1_shared.o hakmem_prof_shared.o hakmem_learner_shared.o hakmem_size_hist_shared.o hakmem_learn_log_shared.o hakmem_syscall_shared.o tiny_fastcache_shared.o core/box/super_reg_box_shared.o core/box/shared_pool_box_shared.o core/box/remote_side_box_shared.o core/tiny_destructors_shared.o
# Pool TLS Phase 1 (enable with POOL_TLS_PHASE1=1)
ifeq ($(POOL_TLS_PHASE1),1)
@ -427,7 +427,7 @@ test-box-refactor: box-refactor
./larson_hakmem 10 8 128 1024 1 12345 4
# Phase 4: Tiny Pool benchmarks (properly linked with hakmem)
TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o
TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/box/tiny_tcache_env_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o
TINY_BENCH_OBJS = $(TINY_BENCH_OBJS_BASE)
ifeq ($(POOL_TLS_PHASE1),1)
TINY_BENCH_OBJS += pool_tls.o pool_refill.o core/pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o

View File

@ -11,6 +11,7 @@
#include "box/hakmem_env_snapshot_box.h" // hakmem_env_snapshot_refresh_from_env (Phase 4 E1)
#include "box/tiny_free_route_cache_env_box.h" // tiny_free_static_route_refresh_from_env (Phase 8)
#include "box/tiny_c7_preserve_header_env_box.h" // tiny_c7_preserve_header_env_refresh_from_env (Phase 13 v1)
#include "box/tiny_tcache_env_box.h" // tiny_tcache_env_refresh_from_env (Phase 14 v1)
#endif
// env が未設定のときだけ既定値を入れる
@ -187,5 +188,7 @@ static inline void bench_apply_profile(void) {
tiny_free_static_route_refresh_from_env();
// Phase 13 v1: Sync C7 preserve header ENV cache after bench_profile putenv defaults.
tiny_c7_preserve_header_env_refresh_from_env();
// Phase 14 v1: Sync tcache ENV cache after bench_profile putenv defaults.
tiny_tcache_env_refresh_from_env();
#endif
}

162
core/box/tiny_tcache_box.h Normal file
View File

@ -0,0 +1,162 @@
// ============================================================================
// Phase 14 v1: Tiny tcache Box (L1) - Intrusive LIFO Cache
// ============================================================================
//
// Purpose: Per-class intrusive LIFO cache (tcache-style)
//
// Design: docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_DESIGN.md
//
// Strategy:
// - Thread-local head pointers + count per class
// - Intrusive next pointers stored in blocks (via tiny_next_store/load)
// - Cap-limited LIFO (no overflow, delegate to UnifiedCache)
// - Hit path: O(1) pointer operations only (no array access)
//
// Invariants:
// - Only BASE pointers stored (never USER pointers)
// - count <= cap always
// - One block in tcache OR unified_cache, never both
// - Next pointer via tiny_next_store/load SSOT only
//
// API:
// tiny_tcache_try_push(class_idx, base) -> bool (handled?)
// tiny_tcache_try_pop(class_idx) -> void* (BASE or NULL)
//
// Safety:
// - Debug: assert count <= cap
// - Debug: assert base != NULL and reasonable range
// - Release: fast path, no checks
//
// ============================================================================
#ifndef TINY_TCACHE_BOX_H
#define TINY_TCACHE_BOX_H
#include <stdint.h>
#include <stdbool.h>
#include <assert.h>
#include "../hakmem_build_flags.h"
#include "../hakmem_tiny_config.h" // TINY_NUM_CLASSES
#include "../tiny_nextptr.h" // tiny_next_store/load SSOT
#include "tiny_tcache_env_box.h" // tiny_tcache_enabled/cap
// ============================================================================
// TLS State (per-thread, per-class)
// ============================================================================
typedef struct {
void* head; // BASE pointer to first block (or NULL)
uint16_t count; // Number of blocks in this tcache
} TinyTcacheBin;
// Thread-local storage: 8 classes (C0-C7)
static __thread TinyTcacheBin g_tiny_tcache_bins[TINY_NUM_CLASSES];
// ============================================================================
// Push (try to add block to tcache)
// ============================================================================
//
// Arguments:
// class_idx - Tiny class index (0-7)
// base - BASE pointer to freed block
//
// Returns:
// true - Block accepted into tcache
// false - Tcache full (overflow), caller should use unified_cache
//
// Side effects:
// - Writes intrusive next pointer into block (via tiny_next_store)
// - Updates head and count
//
static inline bool tiny_tcache_try_push(int class_idx, void* base) {
// ENV gate check (cached, should be fast)
if (!tiny_tcache_enabled()) {
return false; // Tcache disabled, fall through to unified_cache
}
TinyTcacheBin* bin = &g_tiny_tcache_bins[class_idx];
uint16_t cap = tiny_tcache_cap();
// Check capacity
if (bin->count >= cap) {
return false; // Overflow, delegate to unified_cache
}
// Debug: validate base pointer
#if !HAKMEM_BUILD_RELEASE
if (base == NULL || (uintptr_t)base < 4096) {
fprintf(stderr, "[TINY_TCACHE] BUG: invalid base=%p in try_push (class=%d)\n", base, class_idx);
abort();
}
#endif
// LIFO push: link block to current head
tiny_next_store(base, class_idx, bin->head);
bin->head = base;
bin->count++;
// Debug: check invariant
#if !HAKMEM_BUILD_RELEASE
assert(bin->count <= cap);
#endif
return true; // Block accepted
}
// ============================================================================
// Pop (try to get block from tcache)
// ============================================================================
//
// Arguments:
// class_idx - Tiny class index (0-7)
//
// Returns:
// BASE pointer - Block from tcache (LIFO order)
// NULL - Tcache empty, caller should use unified_cache
//
// Side effects:
// - Reads intrusive next pointer from block (via tiny_next_load)
// - Updates head and count
//
static inline void* tiny_tcache_try_pop(int class_idx) {
// ENV gate check (cached, should be fast)
if (!tiny_tcache_enabled()) {
return NULL; // Tcache disabled, fall through to unified_cache
}
TinyTcacheBin* bin = &g_tiny_tcache_bins[class_idx];
// Check if empty
if (bin->head == NULL) {
return NULL; // Miss, delegate to unified_cache
}
// LIFO pop: unlink head
void* base = bin->head;
void* next = tiny_next_load(base, class_idx);
bin->head = next;
bin->count--;
// Debug: validate popped pointer
#if !HAKMEM_BUILD_RELEASE
if (base == NULL || (uintptr_t)base < 4096) {
fprintf(stderr, "[TINY_TCACHE] BUG: invalid base=%p in try_pop (class=%d)\n", base, class_idx);
abort();
}
#endif
return base; // Hit (BASE pointer)
}
// ============================================================================
// Stats (optional, for diagnostics)
// ============================================================================
// Get current count for a class (debug/stats only)
static inline uint16_t tiny_tcache_count(int class_idx) {
return g_tiny_tcache_bins[class_idx].count;
}
#endif // TINY_TCACHE_BOX_H

View File

@ -0,0 +1,68 @@
// ============================================================================
// Phase 14 v1: Tiny tcache ENV Box (L0) - Implementation
// ============================================================================
#include "tiny_tcache_env_box.h"
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <unistd.h>
// ============================================================================
// Global State
// ============================================================================
_Atomic int g_tiny_tcache_enabled = -1;
_Atomic uint16_t g_tiny_tcache_cap = 0;
// ============================================================================
// Init (Cold Path)
// ============================================================================
int tiny_tcache_env_init(void) {
const char* env_enabled = getenv("HAKMEM_TINY_TCACHE");
const char* env_cap = getenv("HAKMEM_TINY_TCACHE_CAP");
int enabled = 0; // default: OFF (opt-in)
uint16_t cap = 64; // default: 64 (glibc tcache-like)
// Parse HAKMEM_TINY_TCACHE
if (env_enabled && (env_enabled[0] == '1' || strcmp(env_enabled, "true") == 0 || strcmp(env_enabled, "TRUE") == 0)) {
enabled = 1;
}
// Parse HAKMEM_TINY_TCACHE_CAP
if (env_cap && *env_cap) {
int parsed = atoi(env_cap);
if (parsed > 0 && parsed <= 65535) {
cap = (uint16_t)parsed;
}
}
// Cache results
atomic_store_explicit(&g_tiny_tcache_enabled, enabled, memory_order_relaxed);
atomic_store_explicit(&g_tiny_tcache_cap, cap, memory_order_relaxed);
// Log once (stderr for immediate visibility)
if (enabled) {
char msg[128];
int n = snprintf(msg, sizeof(msg), "[TINY_TCACHE] enabled (cap=%u)\n", (unsigned)cap);
if (n > 0 && n < (int)sizeof(msg)) {
ssize_t w = write(2, msg, (size_t)n);
(void)w;
}
}
return enabled;
}
// ============================================================================
// Refresh (Cold Path, called from bench_profile)
// ============================================================================
void tiny_tcache_env_refresh_from_env(void) {
// Reset to uninitialized state (-1 / 0)
// Next call to tiny_tcache_enabled() / tiny_tcache_cap() will re-read ENV
atomic_store_explicit(&g_tiny_tcache_enabled, -1, memory_order_relaxed);
atomic_store_explicit(&g_tiny_tcache_cap, 0, memory_order_relaxed);
}

View File

@ -0,0 +1,93 @@
// ============================================================================
// Phase 14 v1: Tiny tcache ENV Box (L0)
// ============================================================================
//
// Purpose: ENV gate for tcache-style intrusive LIFO cache
//
// Design: docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_DESIGN.md
// Instructions: docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_NEXT_INSTRUCTIONS.md
//
// Strategy:
// - Add intrusive LIFO tcache layer before array-based UnifiedCache
// - Reduce pointer-chase overhead (system malloc tcache pattern)
// - Hit: head pointer + intrusive next only (no array access)
//
// ENV:
// HAKMEM_TINY_TCACHE=0/1 (default: 0, opt-in)
// HAKMEM_TINY_TCACHE_CAP=64 (default: 64, per-class capacity)
//
// API:
// tiny_tcache_enabled() -> int
// tiny_tcache_cap() -> uint16_t
// tiny_tcache_env_refresh_from_env()
//
// Box Theory:
// - L0: This file (ENV gate, reversible)
// - L1: tiny_tcache_box.h (intrusive LIFO logic)
// - L2: tiny_unified_cache.h (integration point)
//
// Safety:
// - ENV-gated (default OFF, opt-in)
// - Reversible (ENV toggle)
// - No call site changes (integration inside unified_cache)
//
// ============================================================================
#ifndef TINY_TCACHE_ENV_BOX_H
#define TINY_TCACHE_ENV_BOX_H
#include <stdint.h>
#include <stdatomic.h>
// ============================================================================
// Global State (L0)
// ============================================================================
// Cached state: -1 (uninitialized), 0 (disabled), 1 (enabled)
extern _Atomic int g_tiny_tcache_enabled;
// Cached capacity: 0 (uninitialized), >0 (cap value)
extern _Atomic uint16_t g_tiny_tcache_cap;
// ============================================================================
// Hot Inline API (L0)
// ============================================================================
// Check if tcache is enabled
// Returns: 1 if enabled, 0 if disabled
static inline int tiny_tcache_enabled(void) {
int val = atomic_load_explicit(&g_tiny_tcache_enabled, memory_order_relaxed);
if (__builtin_expect(val == -1, 0)) {
// Lazy init: read ENV once
extern int tiny_tcache_env_init(void);
val = tiny_tcache_env_init();
}
return val;
}
// Get tcache capacity per class
// Returns: capacity (default 64)
static inline uint16_t tiny_tcache_cap(void) {
uint16_t cap = atomic_load_explicit(&g_tiny_tcache_cap, memory_order_relaxed);
if (__builtin_expect(cap == 0, 0)) {
// Lazy init: read ENV once
extern int tiny_tcache_env_init(void);
tiny_tcache_env_init();
cap = atomic_load_explicit(&g_tiny_tcache_cap, memory_order_relaxed);
}
return cap;
}
// ============================================================================
// Cold API (L2)
// ============================================================================
// Refresh ENV cache (called from bench_profile after putenv)
// Pattern: Same as Phase 8/13 (FREE_STATIC_ROUTE, C7_PRESERVE_HEADER)
extern void tiny_tcache_env_refresh_from_env(void);
#endif // TINY_TCACHE_ENV_BOX_H

View File

@ -30,6 +30,7 @@
#include "../hakmem_tiny_config.h" // For TINY_NUM_CLASSES
#include "../box/ptr_type_box.h" // Phantom pointer types (BASE/USER)
#include "../box/tiny_front_config_box.h" // Phase 8-Step1: Config macros
#include "../box/tiny_tcache_box.h" // Phase 14 v1: Intrusive LIFO tcache
// ============================================================================
// Phase 3 C2 Patch 3: Bounds Check Compile-out
@ -220,9 +221,16 @@ static inline int unified_cache_push(int class_idx, hak_base_ptr_t base) {
// Fast path: Unified cache disabled → return 0 (not handled)
if (__builtin_expect(!TINY_FRONT_UNIFIED_CACHE_ENABLED, 0)) return 0;
TinyUnifiedCache* cache = &g_unified_cache[class_idx]; // 1 cache miss (TLS)
void* base_raw = HAK_BASE_TO_RAW(base);
// Phase 14 v1: Try tcache first (intrusive LIFO, no array access)
if (tiny_tcache_try_push(class_idx, base_raw)) {
return 1; // SUCCESS (tcache hit, no array access)
}
// Tcache overflow or disabled → fall through to array cache
TinyUnifiedCache* cache = &g_unified_cache[class_idx]; // 1 cache miss (TLS)
// Phase 8-Step3: Lazy init check (conditional in PGO mode)
// PGO builds assume bench_fast_init() prewarmed cache → remove check (-1 branch)
#if !HAKMEM_TINY_FRONT_PGO
@ -281,7 +289,23 @@ static inline hak_base_ptr_t unified_cache_pop_or_refill(int class_idx) {
}
#endif
// Try pop from cache (fast path)
// Phase 14 v1: Try tcache first (intrusive LIFO, no array access)
void* tcache_base = tiny_tcache_try_pop(class_idx);
if (tcache_base != NULL) {
#if !HAKMEM_BUILD_RELEASE
g_unified_cache_hit[class_idx]++;
#endif
// Performance measurement: count cache hits (ENV enabled only)
if (__builtin_expect(unified_cache_measure_check(), 0)) {
atomic_fetch_add_explicit(&g_unified_cache_hits_global,
1, memory_order_relaxed);
atomic_fetch_add_explicit(&g_unified_cache_hits_by_class[class_idx],
1, memory_order_relaxed);
}
return HAK_BASE_FROM_RAW(tcache_base); // HIT (tcache, no array access)
}
// Tcache miss or disabled → try pop from array cache (fast path)
if (__builtin_expect(cache->head != cache->tail, 1)) {
void* base = cache->slots[cache->head]; // 1 cache miss (array access)
cache->head = (cache->head + 1) & cache->mask;

View File

@ -0,0 +1,184 @@
# Phase 14 v1: Pointer-Chase Reduction (tcache-style) A/B Test Results
**Date:** 2025-12-15
**Benchmark:** Mixed (161024B) 10-run cleanenv
**Target:** Reduce pointer-chase overhead with intrusive LIFO tcache layer
**Expected ROI:** +15-25% (design estimate)
**GO Threshold:** +1.0% mean improvement
---
## 1. Implementation Summary
Phase 14 v1 adds an intrusive LIFO tcache layer (L1) before the existing array-based UnifiedCache, inspired by glibc tcache pattern.
**Key Components:**
- `core/box/tiny_tcache_env_box.{h,c}` - L0 ENV gate (HAKMEM_TINY_TCACHE=0/1, default 0)
- `core/box/tiny_tcache_box.h` - L1 intrusive LIFO cache (TLS per-class bins)
- `core/front/tiny_unified_cache.h` - Integration (try tcache first, fall through to array cache)
- `core/bench_profile.h` - Refresh sync for bench_profile
**Important Note (wiring completeness):**
- v1 の tcache pop は `unified_cache_pop_or_refill()` 側にあるが、現行の main alloc hot path`tiny_hot_alloc_fast()`)は `unified_cache_pop_or_refill()` を経由しない。
- 一方で free 側は `unified_cache_push()` 経由で tcache に入るため、`HAKMEM_TINY_TCACHE=1` のとき **tcache が “sink” になり、alloc/pop 側の ROI が測れない**可能性がある。
- 後続の修正Phase 14 v2`tiny_front_hot_box` に pop/push を接続し、再 A/B を推奨する:
- `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_2_NEXT_INSTRUCTIONS.md`
**Design:**
- TLS state: 8 TinyTcacheBin structs (head pointer + count per class)
- Intrusive next pointers stored in blocks (via tiny_next_store/load SSOT)
- Capacity: 64 blocks per class (default, configurable via HAKMEM_TINY_TCACHE_CAP)
- LIFO order: better cache locality than FIFO array cache
- Two-layer fallback: tcache (fast) → unified_cache (overflow/miss)
**ENV Control:**
```bash
export HAKMEM_TINY_TCACHE=0 # Baseline (tcache disabled)
export HAKMEM_TINY_TCACHE=1 # Optimized (tcache enabled)
export HAKMEM_TINY_TCACHE_CAP=64 # Capacity per class (default: 64)
```
---
## 2. A/B Test Results (Mixed 10-run)
### Baseline (TCACHE=0):
```
Run 1: 51,264,525 ops/s
Run 2: 50,950,925 ops/s
Run 3: 51,500,295 ops/s
Run 4: 51,698,050 ops/s
Run 5: 50,396,686 ops/s
Run 6: 50,960,807 ops/s
Run 7: 50,616,179 ops/s
Run 8: 51,817,424 ops/s
Run 9: 50,762,958 ops/s
Run 10: 50,865,941 ops/s
```
**Mean:** 51,083,379 ops/s
**Median:** 50,955,866 ops/s
### Optimized (TCACHE=1):
```
Run 1: 51,555,414 ops/s
Run 2: 51,389,988 ops/s
Run 3: 50,795,917 ops/s
Run 4: 51,880,520 ops/s
Run 5: 50,574,457 ops/s
Run 6: 50,627,901 ops/s
Run 7: 51,233,081 ops/s
Run 8: 51,278,890 ops/s
Run 9: 50,761,326 ops/s
Run 10: 51,770,890 ops/s
```
**Mean:** 51,186,838 ops/s
**Median:** 51,255,986 ops/s
### Delta:
- **Mean delta:** +0.20% (103,459 ops/s improvement)
- **Median delta:** +0.59% (300,120 ops/s improvement)
---
## 3. Verdict: NEUTRAL
**Result:** +0.20% mean improvement (below +1.0% GO threshold)
**Analysis:**
- Phase 14 v1 shows minimal performance impact on Mixed workload
- Median delta (+0.59%) is slightly better than mean, suggesting some stability improvement
- Both deltas are below the +1.0% GO threshold → NEUTRAL classification
- Expected ROI (+15-25%) was not achieved
**Possible Reasons for Lower-than-Expected ROI:**
1. **Workload Mismatch:** Mixed workload (161024B) spans multiple classes (C0-C7), but tcache benefits may be concentrated in hot classes (C2/C3: 128B/256B). Mid/large classes (C5-C7) may not benefit as much.
2. **Cache Locality vs Array Access:** While tcache reduces pointer-chasing, the existing UnifiedCache array access may already be well-cached in L1/L2, limiting improvement.
3. **Cap Too Small:** Default cap=64 may be too small for high-churn workloads, causing frequent overflow to array cache.
4. **Intrusive Next Overhead:** Writing/reading next pointers may add overhead that offsets the pointer-chase reduction.
5. **Incomplete hot-path coverage (v1):** Free 側だけ tcache に入って alloc 側が消費しないため、hit が “見えない” 可能性があるPhase 14 v2 で通電確認が必要)。
**Comparison to Smoke Test:**
- Smoke test (single run): +2.4% (51.88M vs 50.68M ops/s)
- Formal 10-run: +0.20% mean, +0.59% median
- Variance across runs suggests smoke test was an outlier
---
## 4. Recommendation: Freeze as Research Box
**Decision:** Freeze Phase 14 v1 as research box (default OFF)
**Rationale:**
- NEUTRAL result (+0.20%) does not justify promotion to default
- No measurable harm (close to baseline), suitable for research/experimentation
- Future work may explore:
- Per-class cap tuning (hot classes get larger caps)
- Workload-specific profiling (C2/C3-heavy vs C5-C7-heavy)
- Alternative intrusive next pointer strategies
**Next Steps:**
1. Commit Phase 14 v1 implementation with NEUTRAL verdict
2. Update CURRENT_TASK.md to freeze as research box
3. Keep ENV gate (HAKMEM_TINY_TCACHE=0 default) for future experimentation
4. Consider alternative approaches for pointer-chase reduction (e.g., deeper pipeline optimization, better prefetching)
---
## 5. Raw Data
### Baseline (TCACHE=0):
```
51264525
50950925
51500295
51698050
50396686
50960807
50616179
51817424
50762958
50865941
```
Mean: 51,083,379 ops/s
Median: 50,955,866 ops/s
### Optimized (TCACHE=1):
```
51555414
51389988
50795917
51880520
50574457
50627901
51233081
51278890
50761326
51770890
```
Mean: 51,186,838 ops/s
Median: 51,255,986 ops/s
---
## 6. Files Modified
### Created:
- `core/box/tiny_tcache_env_box.h` - L0 ENV gate
- `core/box/tiny_tcache_env_box.c` - ENV init/refresh implementation
- `core/box/tiny_tcache_box.h` - L1 intrusive LIFO cache
### Modified:
- `core/front/tiny_unified_cache.h` - Integration (try tcache first)
- `core/bench_profile.h` - Refresh sync
- `Makefile` - Build system integration
- `scripts/run_mixed_10_cleanenv.sh` - ENV leak prevention (already updated)
---
## 7. Conclusion
Phase 14 v1 (Pointer-Chase Reduction via tcache-style intrusive LIFO) achieved **+0.20% mean improvement** on Mixed 10-run benchmark, which is **NEUTRAL** (below +1.0% GO threshold).
**Final Status:** Freeze as research box (HAKMEM_TINY_TCACHE=0 default, OFF)
**Future Work:** Consider per-class cap tuning or alternative pointer-chase reduction strategies.

View File

@ -147,3 +147,12 @@ GO/NO-GO:
- tcache hit 率が高い場合配列アクセスFIFO の古い再利用を回避できる
- system malloc が速い の差分tcache 的挙動に寄せる最短の一手
---
## Update2025-12-15
v1 の統合点`core/front/tiny_unified_cache.h`だけでは現行の main alloc hot path`tiny_hot_alloc_fast()` tcache を消費しないため
`HAKMEM_TINY_TCACHE=1` のとき tcache sink になりやすい
次は hot path`core/box/tiny_front_hot_box.h` pop/push を接続して通電した状態で再 A/B を取るPhase 14 v2:
- `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_2_NEXT_INSTRUCTIONS.md`

View File

@ -109,3 +109,12 @@ GO のとき:
NO-GO/NEUTRAL のとき:
- research box freezedefault OFF のまま保持)
---
## Update2025-12-15
v1 の統合点(`core/front/tiny_unified_cache.h`)だけだと、現行の main alloc hot path`tiny_hot_alloc_fast()`)が tcache を消費しないため、
`HAKMEM_TINY_TCACHE=1` で “sink” になりやすい。
次は hot path`core/box/tiny_front_hot_box.h`)へ pop/push を接続し、通電した状態で再 A/B を取るPhase 14 v2:
- `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_2_NEXT_INSTRUCTIONS.md`

View File

@ -0,0 +1,131 @@
# Phase 14 v2: Pointer-Chase Reduction — Hot Path Integration Next InstructionsTiny tcache intrusive LIFO
## Status
- Phase 14 v1tcache L1 追加)は Mixed 10-run で **NEUTRAL**+0.20% mean / +0.59% median
- 結果: `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_AB_TEST_RESULTS.md`
- 実装: `core/box/tiny_tcache_env_box.{h,c}` / `core/box/tiny_tcache_box.h` / `core/front/tiny_unified_cache.h`
- ただし現状の v1 は **free 側(`unified_cache_push()`)だけ tcache に入れて、alloc 側(`tiny_hot_alloc_fast()`)が tcache を消費しない**ため、
- tcache が「実質 sink」になり、ROI が正しく測れない
- “tcache-style” の前提push/pop の対称)が崩れている
Phase 14 v2 は **tiny front の実ホットパス**に tcache を接続して、正しい A/B を取り直す。
---
## 0. 目的GO 条件)
Mixed 10-runclean envで:
- **GO**: mean +1.0% 以上
- **NO-GO**: mean -1.0% 以下(即 rollback / freeze
- **NEUTRAL**: ±1.0%research box freeze
追加ゲート(必須):
- `HAKMEM_TINY_TCACHE=1` のとき **tcache pop が実際に発生**している0 なら設計未通電)
---
## 1. Box 図(境界 1 箇所)
```
L0: tiny_tcache_env_box (ENV gate / refresh / rollback)
L1: tiny_tcache_box (intrusive LIFO: push/pop, cap)
L2: tiny_front_hot_box (hot alloc/free: tcache → unified_cache(FIFO))
L3: cold/refill (unified_cache_refill → SuperSlab)
```
境界は **“tcache miss/overflow → 既存 UnifiedCache”** の 1 箇所に固定する。
---
## 2. 実装パッチ順(小さく積む)
### Patch 1: Hot alloc に tcache pop を接続(必須)
対象:
- `core/box/tiny_front_hot_box.h`
変更:
- `tiny_hot_alloc_fast(int class_idx)` の先頭で
- `tiny_tcache_try_pop(class_idx)` を試す
- HIT なら `tiny_header_finalize_alloc(base, class_idx)` で即 return
- MISS なら既存の FIFO`cache->slots[head]`)へフォールバック
要件:
- tcache OFFdefaultではホット経路が肥大しないよう最小差分にする
- “確信がないなら fallback” を厳守Fail-Fast
### Patch 2: Hot free に tcache push を接続(推奨)
対象:
- `core/box/tiny_front_hot_box.h`
変更:
- `tiny_hot_free_fast(int class_idx, void* base)` の先頭で
- `tiny_tcache_try_push(class_idx, base)` を試す
- SUCCESS なら `return 1`
- overflow / disabled のときだけ既存 FIFO へ
狙い:
- `unified_cache_push()` 経由以外の “直 push” 経路でも tcache が効く状態にする
### Patch 3: 可視化最小・TLS
対象候補:
- `core/box/tiny_tcache_box.h`TLS カウンタ)
追加debug / research 用):
- `tcache_pop_hit/miss`
- `tcache_push_hit/overflow`
- “ワンショット dump” を 1 回だけENV opt-inで出せるようにする
禁止:
- hot path に atomic 統計を置かないPhase 12 / POOL-DN-BATCH の教訓)
---
## 3. A/B テスト(同一バイナリ)
Baseline:
```sh
HAKMEM_TINY_TCACHE=0 scripts/run_mixed_10_cleanenv.sh
```
Optimized:
```sh
HAKMEM_TINY_TCACHE=1 scripts/run_mixed_10_cleanenv.sh
```
追加(効果が class 依存か確認):
```sh
HAKMEM_BENCH_C7_ONLY=1 HAKMEM_TINY_TCACHE=0 scripts/run_mixed_10_cleanenv.sh
HAKMEM_BENCH_C7_ONLY=1 HAKMEM_TINY_TCACHE=1 scripts/run_mixed_10_cleanenv.sh
```
cap 探索research、必要なときだけ:
```sh
HAKMEM_TINY_TCACHE=1 HAKMEM_TINY_TCACHE_CAP=32 scripts/run_mixed_10_cleanenv.sh
HAKMEM_TINY_TCACHE=1 HAKMEM_TINY_TCACHE_CAP=64 scripts/run_mixed_10_cleanenv.sh
HAKMEM_TINY_TCACHE=1 HAKMEM_TINY_TCACHE_CAP=128 scripts/run_mixed_10_cleanenv.sh
```
---
## 4. 健康診断(必須)
```sh
scripts/verify_health_profiles.sh
```
---
## 5. 判定と扱い
- GO: `bench_profile` への昇格は **MIXED_TINYV3_C7_SAFE のみ**から開始(段階的)
- NEUTRAL/NO-GO: Phase 14 v2 は research box として freezedefault OFF のまま)
- Rollback:
- `export HAKMEM_TINY_TCACHE=0`