Phase 14 v1: Pointer-Chase Reduction (tcache) NEUTRAL (+0.20%)
Implementation:
- Intrusive LIFO tcache layer (L1) before UnifiedCache
- TLS per-class bins (head pointer + count)
- Intrusive next pointers (via tiny_next_store/load SSOT)
- Cap: 64 blocks per class (default)
- ENV: HAKMEM_TINY_TCACHE=0/1 (default: 0, OFF)
A/B Test Results (Mixed 10-run):
- Baseline (TCACHE=0): 51,083,379 ops/s
- Optimized (TCACHE=1): 51,186,838 ops/s
- Mean delta: +0.20% (below +1.0% GO threshold)
- Median delta: +0.59%
Verdict: NEUTRAL - Freeze as research box (default OFF)
Root Cause (v1 wiring incomplete):
- Free side pushes to tcache via unified_cache_push()
- Alloc hot path (tiny_hot_alloc_fast) doesn't consume tcache
- tcache becomes "sink" without alloc-side pop → ROI not measurable
Files:
- Created: core/box/tiny_tcache_{env_box,box}.h, tiny_tcache_env_box.c
- Modified: core/front/tiny_unified_cache.h (integration)
- Modified: core/bench_profile.h (refresh sync)
- Modified: Makefile (build integration)
- Results: docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_AB_TEST_RESULTS.md
- v2 Instructions: docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_2_NEXT_INSTRUCTIONS.md
Next: Phase 14 v2 (connect tcache to tiny_front_hot_box alloc/free hot path)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
@ -224,12 +224,49 @@ Phase 6-10 で達成した累積改善:
|
|||||||
|
|
||||||
**Next**: Phase 12 Strategic Pause の次の gap 仮説へ進む
|
**Next**: Phase 12 Strategic Pause の次の gap 仮説へ進む
|
||||||
|
|
||||||
### Next: Phase 14(Pointer Chase Reduction / Tiny tcache)
|
### Phase 14 v1: Pointer Chase Reduction (tcache-style) — NEUTRAL (+0.20%) ⚠️ RESEARCH BOX
|
||||||
|
|
||||||
**狙い**: system malloc の tcache に寄せて、Tiny frontend の “配列/FIFO/indirection” コストを減らす。
|
**Date**: 2025-12-15
|
||||||
|
**Verdict**: **NEUTRAL (+0.20%)** — Frozen as research box (default OFF, manual opt-in)
|
||||||
|
|
||||||
- 設計: `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_DESIGN.md`
|
**Target**: Reduce pointer-chase overhead with intrusive LIFO tcache layer (inspired by glibc tcache)
|
||||||
- 指示: `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_NEXT_INSTRUCTIONS.md`
|
|
||||||
|
**Strategy (v1)**:
|
||||||
|
- Add intrusive LIFO tcache layer (L1) before existing array-based UnifiedCache
|
||||||
|
- TLS per-class bins (head pointer + count)
|
||||||
|
- Intrusive next pointers stored in blocks (via tiny_next_store/load SSOT)
|
||||||
|
- Cap: 64 blocks per class (default, configurable)
|
||||||
|
- ENV: `HAKMEM_TINY_TCACHE=0/1` (default: 0, OFF)
|
||||||
|
|
||||||
|
**Results (Mixed 10-run)**:
|
||||||
|
| Case | TCACHE | Mean (ops/s) | Median (ops/s) | Delta |
|
||||||
|
|------|--------|--------------|----------------|-------|
|
||||||
|
| A (baseline) | 0 | 51,083,379 | 50,955,866 | — |
|
||||||
|
| B (optimized) | 1 | 51,186,838 | 51,255,986 | **+0.20%** (mean) / **+0.59%** (median) |
|
||||||
|
|
||||||
|
**Key Findings**:
|
||||||
|
1. **Mean delta: +0.20%** (below +1.0% GO threshold → NEUTRAL)
|
||||||
|
2. **Median delta: +0.59%** (slightly better stability, but still NEUTRAL)
|
||||||
|
3. **Expected ROI (+15-25%) not achieved** on Mixed workload
|
||||||
|
4. ⚠️ **v1 の統合点が “free 側中心” で、alloc ホットパス(`tiny_hot_alloc_fast()`)が tcache を消費しない**
|
||||||
|
- 現状: `unified_cache_push()` は tcache に入るが、alloc 側は FIFO(`g_unified_cache[].slots`)のみ → tcache が実質 sink になりやすい
|
||||||
|
- v1 の A/B は ROI を過小評価する可能性が高い(Phase 14 v2 で通電確認が必要)
|
||||||
|
|
||||||
|
**Possible Reasons for Lower ROI**:
|
||||||
|
- **Workload mismatch**: Mixed (16–1024B) spans C0-C7, but tcache benefits may be concentrated in hot classes (C2/C3)
|
||||||
|
- **Existing cache efficiency**: UnifiedCache array access may already be well-cached in L1/L2
|
||||||
|
- **Cap too small**: Default cap=64 may cause frequent overflow to array cache
|
||||||
|
- **Intrusive next overhead**: Writing/reading next pointers may offset pointer-chase reduction
|
||||||
|
|
||||||
|
**Action**:
|
||||||
|
- ✅ Freeze Phase 14 v1 as research box (default OFF)
|
||||||
|
- ENV: `HAKMEM_TINY_TCACHE=0/1` (default: 0), `HAKMEM_TINY_TCACHE_CAP=64`
|
||||||
|
- 📋 Results: `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_AB_TEST_RESULTS.md`
|
||||||
|
- 📋 Design: `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_DESIGN.md`
|
||||||
|
- 📋 Instructions: `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_NEXT_INSTRUCTIONS.md`
|
||||||
|
- 📋 Next (Phase 14 v2): `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_2_NEXT_INSTRUCTIONS.md`(alloc/pop 統合)
|
||||||
|
|
||||||
|
**Future Work**: Consider per-class cap tuning or alternative pointer-chase reduction strategies
|
||||||
|
|
||||||
## 更新メモ(2025-12-14 Phase 5 E5-3 Analysis - Strategic Pivot)
|
## 更新メモ(2025-12-14 Phase 5 E5-3 Analysis - Strategic Pivot)
|
||||||
|
|
||||||
|
|||||||
6
Makefile
6
Makefile
@ -218,12 +218,12 @@ LDFLAGS += $(EXTRA_LDFLAGS)
|
|||||||
|
|
||||||
# Targets
|
# Targets
|
||||||
TARGET = test_hakmem
|
TARGET = test_hakmem
|
||||||
OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o
|
OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/box/tiny_tcache_env_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o
|
||||||
OBJS = $(OBJS_BASE)
|
OBJS = $(OBJS_BASE)
|
||||||
|
|
||||||
# Shared library
|
# Shared library
|
||||||
SHARED_LIB = libhakmem.so
|
SHARED_LIB = libhakmem.so
|
||||||
SHARED_OBJS = hakmem_shared.o hakmem_config_shared.o hakmem_tiny_config_shared.o hakmem_ucb1_shared.o hakmem_bigcache_shared.o hakmem_pool_shared.o hakmem_l25_pool_shared.o hakmem_site_rules_shared.o hakmem_tiny_shared.o core/box/ss_allocation_box_shared.o superslab_stats_shared.o superslab_cache_shared.o superslab_ace_shared.o superslab_slab_shared.o superslab_backend_shared.o core/superslab_head_stub_shared.o hakmem_smallmid_shared.o core/box/superslab_expansion_box_shared.o core/box/integrity_box_shared.o core/box/mailbox_box_shared.o core/box/front_gate_box_shared.o core/box/front_gate_classifier_shared.o core/box/free_publish_box_shared.o core/box/capacity_box_shared.o core/box/carve_push_box_shared.o core/box/prewarm_box_shared.o core/box/ss_hot_prewarm_box_shared.o core/box/front_metrics_box_shared.o core/box/bench_fast_box_shared.o core/box/ss_addr_map_box_shared.o core/box/ss_pt_impl_shared.o core/box/slab_recycling_box_shared.o core/box/pagefault_telemetry_box_shared.o core/box/tiny_sizeclass_hist_box_shared.o core/box/tiny_env_box_shared.o core/box/tiny_route_box_shared.o core/box/free_front_v3_env_box_shared.o core/box/free_path_stats_box_shared.o core/box/free_dispatch_stats_box_shared.o core/box/alloc_gate_stats_box_shared.o core/box/tiny_page_box_shared.o core/box/tiny_class_policy_box_shared.o core/box/tiny_class_stats_box_shared.o core/box/tiny_policy_learner_box_shared.o core/box/ss_budget_box_shared.o core/box/tiny_mem_stats_box_shared.o core/box/wrapper_env_box_shared.o core/box/free_wrapper_env_snapshot_box_shared.o core/box/malloc_wrapper_env_snapshot_box_shared.o core/box/madvise_guard_box_shared.o core/box/libm_reloc_guard_box_shared.o core/box/hakmem_env_snapshot_box_shared.o core/box/tiny_c7_preserve_header_env_box_shared.o core/page_arena_shared.o core/front/tiny_unified_cache_shared.o core/tiny_alloc_fast_push_shared.o core/tiny_c7_ultra_segment_shared.o core/tiny_c7_ultra_shared.o core/link_stubs_shared.o core/tiny_failfast_shared.o tiny_sticky_shared.o tiny_remote_shared.o tiny_publish_shared.o tiny_debug_ring_shared.o hakmem_tiny_magazine_shared.o hakmem_tiny_stats_shared.o hakmem_tiny_sfc_shared.o hakmem_tiny_query_shared.o hakmem_tiny_rss_shared.o hakmem_tiny_registry_shared.o hakmem_tiny_remote_target_shared.o hakmem_tiny_bg_spill_shared.o tiny_adaptive_sizing_shared.o hakmem_super_registry_shared.o hakmem_shared_pool_shared.o hakmem_shared_pool_acquire_shared.o hakmem_shared_pool_release_shared.o hakmem_elo_shared.o hakmem_batch_shared.o hakmem_p2_shared.o hakmem_sizeclass_dist_shared.o hakmem_evo_shared.o hakmem_debug_shared.o hakmem_sys_shared.o hakmem_whale_shared.o hakmem_policy_shared.o hakmem_ace_shared.o hakmem_ace_stats_shared.o hakmem_ace_controller_shared.o hakmem_ace_metrics_shared.o hakmem_ace_ucb1_shared.o hakmem_prof_shared.o hakmem_learner_shared.o hakmem_size_hist_shared.o hakmem_learn_log_shared.o hakmem_syscall_shared.o tiny_fastcache_shared.o core/box/super_reg_box_shared.o core/box/shared_pool_box_shared.o core/box/remote_side_box_shared.o core/tiny_destructors_shared.o
|
SHARED_OBJS = hakmem_shared.o hakmem_config_shared.o hakmem_tiny_config_shared.o hakmem_ucb1_shared.o hakmem_bigcache_shared.o hakmem_pool_shared.o hakmem_l25_pool_shared.o hakmem_site_rules_shared.o hakmem_tiny_shared.o core/box/ss_allocation_box_shared.o superslab_stats_shared.o superslab_cache_shared.o superslab_ace_shared.o superslab_slab_shared.o superslab_backend_shared.o core/superslab_head_stub_shared.o hakmem_smallmid_shared.o core/box/superslab_expansion_box_shared.o core/box/integrity_box_shared.o core/box/mailbox_box_shared.o core/box/front_gate_box_shared.o core/box/front_gate_classifier_shared.o core/box/free_publish_box_shared.o core/box/capacity_box_shared.o core/box/carve_push_box_shared.o core/box/prewarm_box_shared.o core/box/ss_hot_prewarm_box_shared.o core/box/front_metrics_box_shared.o core/box/bench_fast_box_shared.o core/box/ss_addr_map_box_shared.o core/box/ss_pt_impl_shared.o core/box/slab_recycling_box_shared.o core/box/pagefault_telemetry_box_shared.o core/box/tiny_sizeclass_hist_box_shared.o core/box/tiny_env_box_shared.o core/box/tiny_route_box_shared.o core/box/free_front_v3_env_box_shared.o core/box/free_path_stats_box_shared.o core/box/free_dispatch_stats_box_shared.o core/box/alloc_gate_stats_box_shared.o core/box/tiny_page_box_shared.o core/box/tiny_class_policy_box_shared.o core/box/tiny_class_stats_box_shared.o core/box/tiny_policy_learner_box_shared.o core/box/ss_budget_box_shared.o core/box/tiny_mem_stats_box_shared.o core/box/wrapper_env_box_shared.o core/box/free_wrapper_env_snapshot_box_shared.o core/box/malloc_wrapper_env_snapshot_box_shared.o core/box/madvise_guard_box_shared.o core/box/libm_reloc_guard_box_shared.o core/box/hakmem_env_snapshot_box_shared.o core/box/tiny_c7_preserve_header_env_box_shared.o core/box/tiny_tcache_env_box_shared.o core/page_arena_shared.o core/front/tiny_unified_cache_shared.o core/tiny_alloc_fast_push_shared.o core/tiny_c7_ultra_segment_shared.o core/tiny_c7_ultra_shared.o core/link_stubs_shared.o core/tiny_failfast_shared.o tiny_sticky_shared.o tiny_remote_shared.o tiny_publish_shared.o tiny_debug_ring_shared.o hakmem_tiny_magazine_shared.o hakmem_tiny_stats_shared.o hakmem_tiny_sfc_shared.o hakmem_tiny_query_shared.o hakmem_tiny_rss_shared.o hakmem_tiny_registry_shared.o hakmem_tiny_remote_target_shared.o hakmem_tiny_bg_spill_shared.o tiny_adaptive_sizing_shared.o hakmem_super_registry_shared.o hakmem_shared_pool_shared.o hakmem_shared_pool_acquire_shared.o hakmem_shared_pool_release_shared.o hakmem_elo_shared.o hakmem_batch_shared.o hakmem_p2_shared.o hakmem_sizeclass_dist_shared.o hakmem_evo_shared.o hakmem_debug_shared.o hakmem_sys_shared.o hakmem_whale_shared.o hakmem_policy_shared.o hakmem_ace_shared.o hakmem_ace_stats_shared.o hakmem_ace_controller_shared.o hakmem_ace_metrics_shared.o hakmem_ace_ucb1_shared.o hakmem_prof_shared.o hakmem_learner_shared.o hakmem_size_hist_shared.o hakmem_learn_log_shared.o hakmem_syscall_shared.o tiny_fastcache_shared.o core/box/super_reg_box_shared.o core/box/shared_pool_box_shared.o core/box/remote_side_box_shared.o core/tiny_destructors_shared.o
|
||||||
|
|
||||||
# Pool TLS Phase 1 (enable with POOL_TLS_PHASE1=1)
|
# Pool TLS Phase 1 (enable with POOL_TLS_PHASE1=1)
|
||||||
ifeq ($(POOL_TLS_PHASE1),1)
|
ifeq ($(POOL_TLS_PHASE1),1)
|
||||||
@ -427,7 +427,7 @@ test-box-refactor: box-refactor
|
|||||||
./larson_hakmem 10 8 128 1024 1 12345 4
|
./larson_hakmem 10 8 128 1024 1 12345 4
|
||||||
|
|
||||||
# Phase 4: Tiny Pool benchmarks (properly linked with hakmem)
|
# Phase 4: Tiny Pool benchmarks (properly linked with hakmem)
|
||||||
TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o
|
TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/box/tiny_tcache_env_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o
|
||||||
TINY_BENCH_OBJS = $(TINY_BENCH_OBJS_BASE)
|
TINY_BENCH_OBJS = $(TINY_BENCH_OBJS_BASE)
|
||||||
ifeq ($(POOL_TLS_PHASE1),1)
|
ifeq ($(POOL_TLS_PHASE1),1)
|
||||||
TINY_BENCH_OBJS += pool_tls.o pool_refill.o core/pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o
|
TINY_BENCH_OBJS += pool_tls.o pool_refill.o core/pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o
|
||||||
|
|||||||
@ -11,6 +11,7 @@
|
|||||||
#include "box/hakmem_env_snapshot_box.h" // hakmem_env_snapshot_refresh_from_env (Phase 4 E1)
|
#include "box/hakmem_env_snapshot_box.h" // hakmem_env_snapshot_refresh_from_env (Phase 4 E1)
|
||||||
#include "box/tiny_free_route_cache_env_box.h" // tiny_free_static_route_refresh_from_env (Phase 8)
|
#include "box/tiny_free_route_cache_env_box.h" // tiny_free_static_route_refresh_from_env (Phase 8)
|
||||||
#include "box/tiny_c7_preserve_header_env_box.h" // tiny_c7_preserve_header_env_refresh_from_env (Phase 13 v1)
|
#include "box/tiny_c7_preserve_header_env_box.h" // tiny_c7_preserve_header_env_refresh_from_env (Phase 13 v1)
|
||||||
|
#include "box/tiny_tcache_env_box.h" // tiny_tcache_env_refresh_from_env (Phase 14 v1)
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
// env が未設定のときだけ既定値を入れる
|
// env が未設定のときだけ既定値を入れる
|
||||||
@ -187,5 +188,7 @@ static inline void bench_apply_profile(void) {
|
|||||||
tiny_free_static_route_refresh_from_env();
|
tiny_free_static_route_refresh_from_env();
|
||||||
// Phase 13 v1: Sync C7 preserve header ENV cache after bench_profile putenv defaults.
|
// Phase 13 v1: Sync C7 preserve header ENV cache after bench_profile putenv defaults.
|
||||||
tiny_c7_preserve_header_env_refresh_from_env();
|
tiny_c7_preserve_header_env_refresh_from_env();
|
||||||
|
// Phase 14 v1: Sync tcache ENV cache after bench_profile putenv defaults.
|
||||||
|
tiny_tcache_env_refresh_from_env();
|
||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
|
|||||||
162
core/box/tiny_tcache_box.h
Normal file
162
core/box/tiny_tcache_box.h
Normal file
@ -0,0 +1,162 @@
|
|||||||
|
// ============================================================================
|
||||||
|
// Phase 14 v1: Tiny tcache Box (L1) - Intrusive LIFO Cache
|
||||||
|
// ============================================================================
|
||||||
|
//
|
||||||
|
// Purpose: Per-class intrusive LIFO cache (tcache-style)
|
||||||
|
//
|
||||||
|
// Design: docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_DESIGN.md
|
||||||
|
//
|
||||||
|
// Strategy:
|
||||||
|
// - Thread-local head pointers + count per class
|
||||||
|
// - Intrusive next pointers stored in blocks (via tiny_next_store/load)
|
||||||
|
// - Cap-limited LIFO (no overflow, delegate to UnifiedCache)
|
||||||
|
// - Hit path: O(1) pointer operations only (no array access)
|
||||||
|
//
|
||||||
|
// Invariants:
|
||||||
|
// - Only BASE pointers stored (never USER pointers)
|
||||||
|
// - count <= cap always
|
||||||
|
// - One block in tcache OR unified_cache, never both
|
||||||
|
// - Next pointer via tiny_next_store/load SSOT only
|
||||||
|
//
|
||||||
|
// API:
|
||||||
|
// tiny_tcache_try_push(class_idx, base) -> bool (handled?)
|
||||||
|
// tiny_tcache_try_pop(class_idx) -> void* (BASE or NULL)
|
||||||
|
//
|
||||||
|
// Safety:
|
||||||
|
// - Debug: assert count <= cap
|
||||||
|
// - Debug: assert base != NULL and reasonable range
|
||||||
|
// - Release: fast path, no checks
|
||||||
|
//
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
#ifndef TINY_TCACHE_BOX_H
|
||||||
|
#define TINY_TCACHE_BOX_H
|
||||||
|
|
||||||
|
#include <stdint.h>
|
||||||
|
#include <stdbool.h>
|
||||||
|
#include <assert.h>
|
||||||
|
#include "../hakmem_build_flags.h"
|
||||||
|
#include "../hakmem_tiny_config.h" // TINY_NUM_CLASSES
|
||||||
|
#include "../tiny_nextptr.h" // tiny_next_store/load SSOT
|
||||||
|
#include "tiny_tcache_env_box.h" // tiny_tcache_enabled/cap
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// TLS State (per-thread, per-class)
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
typedef struct {
|
||||||
|
void* head; // BASE pointer to first block (or NULL)
|
||||||
|
uint16_t count; // Number of blocks in this tcache
|
||||||
|
} TinyTcacheBin;
|
||||||
|
|
||||||
|
// Thread-local storage: 8 classes (C0-C7)
|
||||||
|
static __thread TinyTcacheBin g_tiny_tcache_bins[TINY_NUM_CLASSES];
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Push (try to add block to tcache)
|
||||||
|
// ============================================================================
|
||||||
|
//
|
||||||
|
// Arguments:
|
||||||
|
// class_idx - Tiny class index (0-7)
|
||||||
|
// base - BASE pointer to freed block
|
||||||
|
//
|
||||||
|
// Returns:
|
||||||
|
// true - Block accepted into tcache
|
||||||
|
// false - Tcache full (overflow), caller should use unified_cache
|
||||||
|
//
|
||||||
|
// Side effects:
|
||||||
|
// - Writes intrusive next pointer into block (via tiny_next_store)
|
||||||
|
// - Updates head and count
|
||||||
|
//
|
||||||
|
|
||||||
|
static inline bool tiny_tcache_try_push(int class_idx, void* base) {
|
||||||
|
// ENV gate check (cached, should be fast)
|
||||||
|
if (!tiny_tcache_enabled()) {
|
||||||
|
return false; // Tcache disabled, fall through to unified_cache
|
||||||
|
}
|
||||||
|
|
||||||
|
TinyTcacheBin* bin = &g_tiny_tcache_bins[class_idx];
|
||||||
|
uint16_t cap = tiny_tcache_cap();
|
||||||
|
|
||||||
|
// Check capacity
|
||||||
|
if (bin->count >= cap) {
|
||||||
|
return false; // Overflow, delegate to unified_cache
|
||||||
|
}
|
||||||
|
|
||||||
|
// Debug: validate base pointer
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
if (base == NULL || (uintptr_t)base < 4096) {
|
||||||
|
fprintf(stderr, "[TINY_TCACHE] BUG: invalid base=%p in try_push (class=%d)\n", base, class_idx);
|
||||||
|
abort();
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
// LIFO push: link block to current head
|
||||||
|
tiny_next_store(base, class_idx, bin->head);
|
||||||
|
bin->head = base;
|
||||||
|
bin->count++;
|
||||||
|
|
||||||
|
// Debug: check invariant
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
assert(bin->count <= cap);
|
||||||
|
#endif
|
||||||
|
|
||||||
|
return true; // Block accepted
|
||||||
|
}
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Pop (try to get block from tcache)
|
||||||
|
// ============================================================================
|
||||||
|
//
|
||||||
|
// Arguments:
|
||||||
|
// class_idx - Tiny class index (0-7)
|
||||||
|
//
|
||||||
|
// Returns:
|
||||||
|
// BASE pointer - Block from tcache (LIFO order)
|
||||||
|
// NULL - Tcache empty, caller should use unified_cache
|
||||||
|
//
|
||||||
|
// Side effects:
|
||||||
|
// - Reads intrusive next pointer from block (via tiny_next_load)
|
||||||
|
// - Updates head and count
|
||||||
|
//
|
||||||
|
|
||||||
|
static inline void* tiny_tcache_try_pop(int class_idx) {
|
||||||
|
// ENV gate check (cached, should be fast)
|
||||||
|
if (!tiny_tcache_enabled()) {
|
||||||
|
return NULL; // Tcache disabled, fall through to unified_cache
|
||||||
|
}
|
||||||
|
|
||||||
|
TinyTcacheBin* bin = &g_tiny_tcache_bins[class_idx];
|
||||||
|
|
||||||
|
// Check if empty
|
||||||
|
if (bin->head == NULL) {
|
||||||
|
return NULL; // Miss, delegate to unified_cache
|
||||||
|
}
|
||||||
|
|
||||||
|
// LIFO pop: unlink head
|
||||||
|
void* base = bin->head;
|
||||||
|
void* next = tiny_next_load(base, class_idx);
|
||||||
|
bin->head = next;
|
||||||
|
bin->count--;
|
||||||
|
|
||||||
|
// Debug: validate popped pointer
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
if (base == NULL || (uintptr_t)base < 4096) {
|
||||||
|
fprintf(stderr, "[TINY_TCACHE] BUG: invalid base=%p in try_pop (class=%d)\n", base, class_idx);
|
||||||
|
abort();
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
return base; // Hit (BASE pointer)
|
||||||
|
}
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Stats (optional, for diagnostics)
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
// Get current count for a class (debug/stats only)
|
||||||
|
static inline uint16_t tiny_tcache_count(int class_idx) {
|
||||||
|
return g_tiny_tcache_bins[class_idx].count;
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif // TINY_TCACHE_BOX_H
|
||||||
68
core/box/tiny_tcache_env_box.c
Normal file
68
core/box/tiny_tcache_env_box.c
Normal file
@ -0,0 +1,68 @@
|
|||||||
|
// ============================================================================
|
||||||
|
// Phase 14 v1: Tiny tcache ENV Box (L0) - Implementation
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
#include "tiny_tcache_env_box.h"
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <string.h>
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <unistd.h>
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Global State
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
_Atomic int g_tiny_tcache_enabled = -1;
|
||||||
|
_Atomic uint16_t g_tiny_tcache_cap = 0;
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Init (Cold Path)
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
int tiny_tcache_env_init(void) {
|
||||||
|
const char* env_enabled = getenv("HAKMEM_TINY_TCACHE");
|
||||||
|
const char* env_cap = getenv("HAKMEM_TINY_TCACHE_CAP");
|
||||||
|
|
||||||
|
int enabled = 0; // default: OFF (opt-in)
|
||||||
|
uint16_t cap = 64; // default: 64 (glibc tcache-like)
|
||||||
|
|
||||||
|
// Parse HAKMEM_TINY_TCACHE
|
||||||
|
if (env_enabled && (env_enabled[0] == '1' || strcmp(env_enabled, "true") == 0 || strcmp(env_enabled, "TRUE") == 0)) {
|
||||||
|
enabled = 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Parse HAKMEM_TINY_TCACHE_CAP
|
||||||
|
if (env_cap && *env_cap) {
|
||||||
|
int parsed = atoi(env_cap);
|
||||||
|
if (parsed > 0 && parsed <= 65535) {
|
||||||
|
cap = (uint16_t)parsed;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Cache results
|
||||||
|
atomic_store_explicit(&g_tiny_tcache_enabled, enabled, memory_order_relaxed);
|
||||||
|
atomic_store_explicit(&g_tiny_tcache_cap, cap, memory_order_relaxed);
|
||||||
|
|
||||||
|
// Log once (stderr for immediate visibility)
|
||||||
|
if (enabled) {
|
||||||
|
char msg[128];
|
||||||
|
int n = snprintf(msg, sizeof(msg), "[TINY_TCACHE] enabled (cap=%u)\n", (unsigned)cap);
|
||||||
|
if (n > 0 && n < (int)sizeof(msg)) {
|
||||||
|
ssize_t w = write(2, msg, (size_t)n);
|
||||||
|
(void)w;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return enabled;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Refresh (Cold Path, called from bench_profile)
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
void tiny_tcache_env_refresh_from_env(void) {
|
||||||
|
// Reset to uninitialized state (-1 / 0)
|
||||||
|
// Next call to tiny_tcache_enabled() / tiny_tcache_cap() will re-read ENV
|
||||||
|
atomic_store_explicit(&g_tiny_tcache_enabled, -1, memory_order_relaxed);
|
||||||
|
atomic_store_explicit(&g_tiny_tcache_cap, 0, memory_order_relaxed);
|
||||||
|
}
|
||||||
93
core/box/tiny_tcache_env_box.h
Normal file
93
core/box/tiny_tcache_env_box.h
Normal file
@ -0,0 +1,93 @@
|
|||||||
|
// ============================================================================
|
||||||
|
// Phase 14 v1: Tiny tcache ENV Box (L0)
|
||||||
|
// ============================================================================
|
||||||
|
//
|
||||||
|
// Purpose: ENV gate for tcache-style intrusive LIFO cache
|
||||||
|
//
|
||||||
|
// Design: docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_DESIGN.md
|
||||||
|
// Instructions: docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_NEXT_INSTRUCTIONS.md
|
||||||
|
//
|
||||||
|
// Strategy:
|
||||||
|
// - Add intrusive LIFO tcache layer before array-based UnifiedCache
|
||||||
|
// - Reduce pointer-chase overhead (system malloc tcache pattern)
|
||||||
|
// - Hit: head pointer + intrusive next only (no array access)
|
||||||
|
//
|
||||||
|
// ENV:
|
||||||
|
// HAKMEM_TINY_TCACHE=0/1 (default: 0, opt-in)
|
||||||
|
// HAKMEM_TINY_TCACHE_CAP=64 (default: 64, per-class capacity)
|
||||||
|
//
|
||||||
|
// API:
|
||||||
|
// tiny_tcache_enabled() -> int
|
||||||
|
// tiny_tcache_cap() -> uint16_t
|
||||||
|
// tiny_tcache_env_refresh_from_env()
|
||||||
|
//
|
||||||
|
// Box Theory:
|
||||||
|
// - L0: This file (ENV gate, reversible)
|
||||||
|
// - L1: tiny_tcache_box.h (intrusive LIFO logic)
|
||||||
|
// - L2: tiny_unified_cache.h (integration point)
|
||||||
|
//
|
||||||
|
// Safety:
|
||||||
|
// - ENV-gated (default OFF, opt-in)
|
||||||
|
// - Reversible (ENV toggle)
|
||||||
|
// - No call site changes (integration inside unified_cache)
|
||||||
|
//
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
#ifndef TINY_TCACHE_ENV_BOX_H
|
||||||
|
#define TINY_TCACHE_ENV_BOX_H
|
||||||
|
|
||||||
|
#include <stdint.h>
|
||||||
|
#include <stdatomic.h>
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Global State (L0)
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
// Cached state: -1 (uninitialized), 0 (disabled), 1 (enabled)
|
||||||
|
extern _Atomic int g_tiny_tcache_enabled;
|
||||||
|
|
||||||
|
// Cached capacity: 0 (uninitialized), >0 (cap value)
|
||||||
|
extern _Atomic uint16_t g_tiny_tcache_cap;
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Hot Inline API (L0)
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
// Check if tcache is enabled
|
||||||
|
// Returns: 1 if enabled, 0 if disabled
|
||||||
|
static inline int tiny_tcache_enabled(void) {
|
||||||
|
int val = atomic_load_explicit(&g_tiny_tcache_enabled, memory_order_relaxed);
|
||||||
|
|
||||||
|
if (__builtin_expect(val == -1, 0)) {
|
||||||
|
// Lazy init: read ENV once
|
||||||
|
extern int tiny_tcache_env_init(void);
|
||||||
|
val = tiny_tcache_env_init();
|
||||||
|
}
|
||||||
|
|
||||||
|
return val;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Get tcache capacity per class
|
||||||
|
// Returns: capacity (default 64)
|
||||||
|
static inline uint16_t tiny_tcache_cap(void) {
|
||||||
|
uint16_t cap = atomic_load_explicit(&g_tiny_tcache_cap, memory_order_relaxed);
|
||||||
|
|
||||||
|
if (__builtin_expect(cap == 0, 0)) {
|
||||||
|
// Lazy init: read ENV once
|
||||||
|
extern int tiny_tcache_env_init(void);
|
||||||
|
tiny_tcache_env_init();
|
||||||
|
cap = atomic_load_explicit(&g_tiny_tcache_cap, memory_order_relaxed);
|
||||||
|
}
|
||||||
|
|
||||||
|
return cap;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Cold API (L2)
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
// Refresh ENV cache (called from bench_profile after putenv)
|
||||||
|
// Pattern: Same as Phase 8/13 (FREE_STATIC_ROUTE, C7_PRESERVE_HEADER)
|
||||||
|
extern void tiny_tcache_env_refresh_from_env(void);
|
||||||
|
|
||||||
|
#endif // TINY_TCACHE_ENV_BOX_H
|
||||||
@ -30,6 +30,7 @@
|
|||||||
#include "../hakmem_tiny_config.h" // For TINY_NUM_CLASSES
|
#include "../hakmem_tiny_config.h" // For TINY_NUM_CLASSES
|
||||||
#include "../box/ptr_type_box.h" // Phantom pointer types (BASE/USER)
|
#include "../box/ptr_type_box.h" // Phantom pointer types (BASE/USER)
|
||||||
#include "../box/tiny_front_config_box.h" // Phase 8-Step1: Config macros
|
#include "../box/tiny_front_config_box.h" // Phase 8-Step1: Config macros
|
||||||
|
#include "../box/tiny_tcache_box.h" // Phase 14 v1: Intrusive LIFO tcache
|
||||||
|
|
||||||
// ============================================================================
|
// ============================================================================
|
||||||
// Phase 3 C2 Patch 3: Bounds Check Compile-out
|
// Phase 3 C2 Patch 3: Bounds Check Compile-out
|
||||||
@ -220,9 +221,16 @@ static inline int unified_cache_push(int class_idx, hak_base_ptr_t base) {
|
|||||||
// Fast path: Unified cache disabled → return 0 (not handled)
|
// Fast path: Unified cache disabled → return 0 (not handled)
|
||||||
if (__builtin_expect(!TINY_FRONT_UNIFIED_CACHE_ENABLED, 0)) return 0;
|
if (__builtin_expect(!TINY_FRONT_UNIFIED_CACHE_ENABLED, 0)) return 0;
|
||||||
|
|
||||||
TinyUnifiedCache* cache = &g_unified_cache[class_idx]; // 1 cache miss (TLS)
|
|
||||||
void* base_raw = HAK_BASE_TO_RAW(base);
|
void* base_raw = HAK_BASE_TO_RAW(base);
|
||||||
|
|
||||||
|
// Phase 14 v1: Try tcache first (intrusive LIFO, no array access)
|
||||||
|
if (tiny_tcache_try_push(class_idx, base_raw)) {
|
||||||
|
return 1; // SUCCESS (tcache hit, no array access)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Tcache overflow or disabled → fall through to array cache
|
||||||
|
TinyUnifiedCache* cache = &g_unified_cache[class_idx]; // 1 cache miss (TLS)
|
||||||
|
|
||||||
// Phase 8-Step3: Lazy init check (conditional in PGO mode)
|
// Phase 8-Step3: Lazy init check (conditional in PGO mode)
|
||||||
// PGO builds assume bench_fast_init() prewarmed cache → remove check (-1 branch)
|
// PGO builds assume bench_fast_init() prewarmed cache → remove check (-1 branch)
|
||||||
#if !HAKMEM_TINY_FRONT_PGO
|
#if !HAKMEM_TINY_FRONT_PGO
|
||||||
@ -281,7 +289,23 @@ static inline hak_base_ptr_t unified_cache_pop_or_refill(int class_idx) {
|
|||||||
}
|
}
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
// Try pop from cache (fast path)
|
// Phase 14 v1: Try tcache first (intrusive LIFO, no array access)
|
||||||
|
void* tcache_base = tiny_tcache_try_pop(class_idx);
|
||||||
|
if (tcache_base != NULL) {
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
g_unified_cache_hit[class_idx]++;
|
||||||
|
#endif
|
||||||
|
// Performance measurement: count cache hits (ENV enabled only)
|
||||||
|
if (__builtin_expect(unified_cache_measure_check(), 0)) {
|
||||||
|
atomic_fetch_add_explicit(&g_unified_cache_hits_global,
|
||||||
|
1, memory_order_relaxed);
|
||||||
|
atomic_fetch_add_explicit(&g_unified_cache_hits_by_class[class_idx],
|
||||||
|
1, memory_order_relaxed);
|
||||||
|
}
|
||||||
|
return HAK_BASE_FROM_RAW(tcache_base); // HIT (tcache, no array access)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Tcache miss or disabled → try pop from array cache (fast path)
|
||||||
if (__builtin_expect(cache->head != cache->tail, 1)) {
|
if (__builtin_expect(cache->head != cache->tail, 1)) {
|
||||||
void* base = cache->slots[cache->head]; // 1 cache miss (array access)
|
void* base = cache->slots[cache->head]; // 1 cache miss (array access)
|
||||||
cache->head = (cache->head + 1) & cache->mask;
|
cache->head = (cache->head + 1) & cache->mask;
|
||||||
|
|||||||
@ -0,0 +1,184 @@
|
|||||||
|
# Phase 14 v1: Pointer-Chase Reduction (tcache-style) A/B Test Results
|
||||||
|
|
||||||
|
**Date:** 2025-12-15
|
||||||
|
**Benchmark:** Mixed (16–1024B) 10-run cleanenv
|
||||||
|
**Target:** Reduce pointer-chase overhead with intrusive LIFO tcache layer
|
||||||
|
**Expected ROI:** +15-25% (design estimate)
|
||||||
|
**GO Threshold:** +1.0% mean improvement
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Implementation Summary
|
||||||
|
|
||||||
|
Phase 14 v1 adds an intrusive LIFO tcache layer (L1) before the existing array-based UnifiedCache, inspired by glibc tcache pattern.
|
||||||
|
|
||||||
|
**Key Components:**
|
||||||
|
- `core/box/tiny_tcache_env_box.{h,c}` - L0 ENV gate (HAKMEM_TINY_TCACHE=0/1, default 0)
|
||||||
|
- `core/box/tiny_tcache_box.h` - L1 intrusive LIFO cache (TLS per-class bins)
|
||||||
|
- `core/front/tiny_unified_cache.h` - Integration (try tcache first, fall through to array cache)
|
||||||
|
- `core/bench_profile.h` - Refresh sync for bench_profile
|
||||||
|
|
||||||
|
**Important Note (wiring completeness):**
|
||||||
|
- v1 の tcache pop は `unified_cache_pop_or_refill()` 側にあるが、現行の main alloc hot path(`tiny_hot_alloc_fast()`)は `unified_cache_pop_or_refill()` を経由しない。
|
||||||
|
- 一方で free 側は `unified_cache_push()` 経由で tcache に入るため、`HAKMEM_TINY_TCACHE=1` のとき **tcache が “sink” になり、alloc/pop 側の ROI が測れない**可能性がある。
|
||||||
|
- 後続の修正(Phase 14 v2)で `tiny_front_hot_box` に pop/push を接続し、再 A/B を推奨する:
|
||||||
|
- `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_2_NEXT_INSTRUCTIONS.md`
|
||||||
|
|
||||||
|
**Design:**
|
||||||
|
- TLS state: 8 TinyTcacheBin structs (head pointer + count per class)
|
||||||
|
- Intrusive next pointers stored in blocks (via tiny_next_store/load SSOT)
|
||||||
|
- Capacity: 64 blocks per class (default, configurable via HAKMEM_TINY_TCACHE_CAP)
|
||||||
|
- LIFO order: better cache locality than FIFO array cache
|
||||||
|
- Two-layer fallback: tcache (fast) → unified_cache (overflow/miss)
|
||||||
|
|
||||||
|
**ENV Control:**
|
||||||
|
```bash
|
||||||
|
export HAKMEM_TINY_TCACHE=0 # Baseline (tcache disabled)
|
||||||
|
export HAKMEM_TINY_TCACHE=1 # Optimized (tcache enabled)
|
||||||
|
export HAKMEM_TINY_TCACHE_CAP=64 # Capacity per class (default: 64)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. A/B Test Results (Mixed 10-run)
|
||||||
|
|
||||||
|
### Baseline (TCACHE=0):
|
||||||
|
```
|
||||||
|
Run 1: 51,264,525 ops/s
|
||||||
|
Run 2: 50,950,925 ops/s
|
||||||
|
Run 3: 51,500,295 ops/s
|
||||||
|
Run 4: 51,698,050 ops/s
|
||||||
|
Run 5: 50,396,686 ops/s
|
||||||
|
Run 6: 50,960,807 ops/s
|
||||||
|
Run 7: 50,616,179 ops/s
|
||||||
|
Run 8: 51,817,424 ops/s
|
||||||
|
Run 9: 50,762,958 ops/s
|
||||||
|
Run 10: 50,865,941 ops/s
|
||||||
|
```
|
||||||
|
**Mean:** 51,083,379 ops/s
|
||||||
|
**Median:** 50,955,866 ops/s
|
||||||
|
|
||||||
|
### Optimized (TCACHE=1):
|
||||||
|
```
|
||||||
|
Run 1: 51,555,414 ops/s
|
||||||
|
Run 2: 51,389,988 ops/s
|
||||||
|
Run 3: 50,795,917 ops/s
|
||||||
|
Run 4: 51,880,520 ops/s
|
||||||
|
Run 5: 50,574,457 ops/s
|
||||||
|
Run 6: 50,627,901 ops/s
|
||||||
|
Run 7: 51,233,081 ops/s
|
||||||
|
Run 8: 51,278,890 ops/s
|
||||||
|
Run 9: 50,761,326 ops/s
|
||||||
|
Run 10: 51,770,890 ops/s
|
||||||
|
```
|
||||||
|
**Mean:** 51,186,838 ops/s
|
||||||
|
**Median:** 51,255,986 ops/s
|
||||||
|
|
||||||
|
### Delta:
|
||||||
|
- **Mean delta:** +0.20% (103,459 ops/s improvement)
|
||||||
|
- **Median delta:** +0.59% (300,120 ops/s improvement)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Verdict: NEUTRAL
|
||||||
|
|
||||||
|
**Result:** +0.20% mean improvement (below +1.0% GO threshold)
|
||||||
|
|
||||||
|
**Analysis:**
|
||||||
|
- Phase 14 v1 shows minimal performance impact on Mixed workload
|
||||||
|
- Median delta (+0.59%) is slightly better than mean, suggesting some stability improvement
|
||||||
|
- Both deltas are below the +1.0% GO threshold → NEUTRAL classification
|
||||||
|
- Expected ROI (+15-25%) was not achieved
|
||||||
|
|
||||||
|
**Possible Reasons for Lower-than-Expected ROI:**
|
||||||
|
1. **Workload Mismatch:** Mixed workload (16–1024B) spans multiple classes (C0-C7), but tcache benefits may be concentrated in hot classes (C2/C3: 128B/256B). Mid/large classes (C5-C7) may not benefit as much.
|
||||||
|
2. **Cache Locality vs Array Access:** While tcache reduces pointer-chasing, the existing UnifiedCache array access may already be well-cached in L1/L2, limiting improvement.
|
||||||
|
3. **Cap Too Small:** Default cap=64 may be too small for high-churn workloads, causing frequent overflow to array cache.
|
||||||
|
4. **Intrusive Next Overhead:** Writing/reading next pointers may add overhead that offsets the pointer-chase reduction.
|
||||||
|
5. **Incomplete hot-path coverage (v1):** Free 側だけ tcache に入って alloc 側が消費しないため、hit が “見えない” 可能性がある(Phase 14 v2 で通電確認が必要)。
|
||||||
|
|
||||||
|
**Comparison to Smoke Test:**
|
||||||
|
- Smoke test (single run): +2.4% (51.88M vs 50.68M ops/s)
|
||||||
|
- Formal 10-run: +0.20% mean, +0.59% median
|
||||||
|
- Variance across runs suggests smoke test was an outlier
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Recommendation: Freeze as Research Box
|
||||||
|
|
||||||
|
**Decision:** Freeze Phase 14 v1 as research box (default OFF)
|
||||||
|
|
||||||
|
**Rationale:**
|
||||||
|
- NEUTRAL result (+0.20%) does not justify promotion to default
|
||||||
|
- No measurable harm (close to baseline), suitable for research/experimentation
|
||||||
|
- Future work may explore:
|
||||||
|
- Per-class cap tuning (hot classes get larger caps)
|
||||||
|
- Workload-specific profiling (C2/C3-heavy vs C5-C7-heavy)
|
||||||
|
- Alternative intrusive next pointer strategies
|
||||||
|
|
||||||
|
**Next Steps:**
|
||||||
|
1. Commit Phase 14 v1 implementation with NEUTRAL verdict
|
||||||
|
2. Update CURRENT_TASK.md to freeze as research box
|
||||||
|
3. Keep ENV gate (HAKMEM_TINY_TCACHE=0 default) for future experimentation
|
||||||
|
4. Consider alternative approaches for pointer-chase reduction (e.g., deeper pipeline optimization, better prefetching)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Raw Data
|
||||||
|
|
||||||
|
### Baseline (TCACHE=0):
|
||||||
|
```
|
||||||
|
51264525
|
||||||
|
50950925
|
||||||
|
51500295
|
||||||
|
51698050
|
||||||
|
50396686
|
||||||
|
50960807
|
||||||
|
50616179
|
||||||
|
51817424
|
||||||
|
50762958
|
||||||
|
50865941
|
||||||
|
```
|
||||||
|
Mean: 51,083,379 ops/s
|
||||||
|
Median: 50,955,866 ops/s
|
||||||
|
|
||||||
|
### Optimized (TCACHE=1):
|
||||||
|
```
|
||||||
|
51555414
|
||||||
|
51389988
|
||||||
|
50795917
|
||||||
|
51880520
|
||||||
|
50574457
|
||||||
|
50627901
|
||||||
|
51233081
|
||||||
|
51278890
|
||||||
|
50761326
|
||||||
|
51770890
|
||||||
|
```
|
||||||
|
Mean: 51,186,838 ops/s
|
||||||
|
Median: 51,255,986 ops/s
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Files Modified
|
||||||
|
|
||||||
|
### Created:
|
||||||
|
- `core/box/tiny_tcache_env_box.h` - L0 ENV gate
|
||||||
|
- `core/box/tiny_tcache_env_box.c` - ENV init/refresh implementation
|
||||||
|
- `core/box/tiny_tcache_box.h` - L1 intrusive LIFO cache
|
||||||
|
|
||||||
|
### Modified:
|
||||||
|
- `core/front/tiny_unified_cache.h` - Integration (try tcache first)
|
||||||
|
- `core/bench_profile.h` - Refresh sync
|
||||||
|
- `Makefile` - Build system integration
|
||||||
|
- `scripts/run_mixed_10_cleanenv.sh` - ENV leak prevention (already updated)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Conclusion
|
||||||
|
|
||||||
|
Phase 14 v1 (Pointer-Chase Reduction via tcache-style intrusive LIFO) achieved **+0.20% mean improvement** on Mixed 10-run benchmark, which is **NEUTRAL** (below +1.0% GO threshold).
|
||||||
|
|
||||||
|
**Final Status:** Freeze as research box (HAKMEM_TINY_TCACHE=0 default, OFF)
|
||||||
|
|
||||||
|
**Future Work:** Consider per-class cap tuning or alternative pointer-chase reduction strategies.
|
||||||
@ -147,3 +147,12 @@ GO/NO-GO:
|
|||||||
- tcache hit 率が高い場合、配列アクセス・FIFO の古い再利用を回避できる
|
- tcache hit 率が高い場合、配列アクセス・FIFO の古い再利用を回避できる
|
||||||
- “system malloc が速い” の差分(tcache 的挙動)に寄せる最短の一手
|
- “system malloc が速い” の差分(tcache 的挙動)に寄せる最短の一手
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Update(2025-12-15)
|
||||||
|
|
||||||
|
v1 の統合点(`core/front/tiny_unified_cache.h`)だけでは、現行の main alloc hot path(`tiny_hot_alloc_fast()`)が tcache を消費しないため、
|
||||||
|
`HAKMEM_TINY_TCACHE=1` のとき tcache が “sink” になりやすい。
|
||||||
|
|
||||||
|
次は hot path(`core/box/tiny_front_hot_box.h`)へ pop/push を接続して、通電した状態で再 A/B を取る(Phase 14 v2):
|
||||||
|
- `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_2_NEXT_INSTRUCTIONS.md`
|
||||||
|
|||||||
@ -109,3 +109,12 @@ GO のとき:
|
|||||||
NO-GO/NEUTRAL のとき:
|
NO-GO/NEUTRAL のとき:
|
||||||
- research box freeze(default OFF のまま保持)
|
- research box freeze(default OFF のまま保持)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Update(2025-12-15)
|
||||||
|
|
||||||
|
v1 の統合点(`core/front/tiny_unified_cache.h`)だけだと、現行の main alloc hot path(`tiny_hot_alloc_fast()`)が tcache を消費しないため、
|
||||||
|
`HAKMEM_TINY_TCACHE=1` で “sink” になりやすい。
|
||||||
|
|
||||||
|
次は hot path(`core/box/tiny_front_hot_box.h`)へ pop/push を接続し、通電した状態で再 A/B を取る(Phase 14 v2):
|
||||||
|
- `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_2_NEXT_INSTRUCTIONS.md`
|
||||||
|
|||||||
@ -0,0 +1,131 @@
|
|||||||
|
# Phase 14 v2: Pointer-Chase Reduction — Hot Path Integration Next Instructions(Tiny tcache intrusive LIFO)
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
- Phase 14 v1(tcache L1 追加)は Mixed 10-run で **NEUTRAL**(+0.20% mean / +0.59% median)
|
||||||
|
- 結果: `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_AB_TEST_RESULTS.md`
|
||||||
|
- 実装: `core/box/tiny_tcache_env_box.{h,c}` / `core/box/tiny_tcache_box.h` / `core/front/tiny_unified_cache.h`
|
||||||
|
- ただし現状の v1 は **free 側(`unified_cache_push()`)だけ tcache に入れて、alloc 側(`tiny_hot_alloc_fast()`)が tcache を消費しない**ため、
|
||||||
|
- tcache が「実質 sink」になり、ROI が正しく測れない
|
||||||
|
- “tcache-style” の前提(push/pop の対称)が崩れている
|
||||||
|
|
||||||
|
Phase 14 v2 は **tiny front の実ホットパス**に tcache を接続して、正しい A/B を取り直す。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 0. 目的(GO 条件)
|
||||||
|
|
||||||
|
Mixed 10-run(clean env)で:
|
||||||
|
- **GO**: mean +1.0% 以上
|
||||||
|
- **NO-GO**: mean -1.0% 以下(即 rollback / freeze)
|
||||||
|
- **NEUTRAL**: ±1.0%(research box freeze)
|
||||||
|
|
||||||
|
追加ゲート(必須):
|
||||||
|
- `HAKMEM_TINY_TCACHE=1` のとき **tcache pop が実際に発生**している(0 なら設計未通電)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Box 図(境界 1 箇所)
|
||||||
|
|
||||||
|
```
|
||||||
|
L0: tiny_tcache_env_box (ENV gate / refresh / rollback)
|
||||||
|
↓
|
||||||
|
L1: tiny_tcache_box (intrusive LIFO: push/pop, cap)
|
||||||
|
↓
|
||||||
|
L2: tiny_front_hot_box (hot alloc/free: tcache → unified_cache(FIFO))
|
||||||
|
↓
|
||||||
|
L3: cold/refill (unified_cache_refill → SuperSlab)
|
||||||
|
```
|
||||||
|
|
||||||
|
境界は **“tcache miss/overflow → 既存 UnifiedCache”** の 1 箇所に固定する。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. 実装パッチ順(小さく積む)
|
||||||
|
|
||||||
|
### Patch 1: Hot alloc に tcache pop を接続(必須)
|
||||||
|
|
||||||
|
対象:
|
||||||
|
- `core/box/tiny_front_hot_box.h`
|
||||||
|
|
||||||
|
変更:
|
||||||
|
- `tiny_hot_alloc_fast(int class_idx)` の先頭で
|
||||||
|
- `tiny_tcache_try_pop(class_idx)` を試す
|
||||||
|
- HIT なら `tiny_header_finalize_alloc(base, class_idx)` で即 return
|
||||||
|
- MISS なら既存の FIFO(`cache->slots[head]`)へフォールバック
|
||||||
|
|
||||||
|
要件:
|
||||||
|
- tcache OFF(default)ではホット経路が肥大しないよう最小差分にする
|
||||||
|
- “確信がないなら fallback” を厳守(Fail-Fast)
|
||||||
|
|
||||||
|
### Patch 2: Hot free に tcache push を接続(推奨)
|
||||||
|
|
||||||
|
対象:
|
||||||
|
- `core/box/tiny_front_hot_box.h`
|
||||||
|
|
||||||
|
変更:
|
||||||
|
- `tiny_hot_free_fast(int class_idx, void* base)` の先頭で
|
||||||
|
- `tiny_tcache_try_push(class_idx, base)` を試す
|
||||||
|
- SUCCESS なら `return 1`
|
||||||
|
- overflow / disabled のときだけ既存 FIFO へ
|
||||||
|
|
||||||
|
狙い:
|
||||||
|
- `unified_cache_push()` 経由以外の “直 push” 経路でも tcache が効く状態にする
|
||||||
|
|
||||||
|
### Patch 3: 可視化(最小・TLS)
|
||||||
|
|
||||||
|
対象候補:
|
||||||
|
- `core/box/tiny_tcache_box.h`(TLS カウンタ)
|
||||||
|
|
||||||
|
追加(debug / research 用):
|
||||||
|
- `tcache_pop_hit/miss`
|
||||||
|
- `tcache_push_hit/overflow`
|
||||||
|
- “ワンショット dump” を 1 回だけ(ENV opt-in)で出せるようにする
|
||||||
|
|
||||||
|
禁止:
|
||||||
|
- hot path に atomic 統計を置かない(Phase 12 / POOL-DN-BATCH の教訓)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. A/B テスト(同一バイナリ)
|
||||||
|
|
||||||
|
Baseline:
|
||||||
|
```sh
|
||||||
|
HAKMEM_TINY_TCACHE=0 scripts/run_mixed_10_cleanenv.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
Optimized:
|
||||||
|
```sh
|
||||||
|
HAKMEM_TINY_TCACHE=1 scripts/run_mixed_10_cleanenv.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
追加(効果が class 依存か確認):
|
||||||
|
```sh
|
||||||
|
HAKMEM_BENCH_C7_ONLY=1 HAKMEM_TINY_TCACHE=0 scripts/run_mixed_10_cleanenv.sh
|
||||||
|
HAKMEM_BENCH_C7_ONLY=1 HAKMEM_TINY_TCACHE=1 scripts/run_mixed_10_cleanenv.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
cap 探索(research、必要なときだけ):
|
||||||
|
```sh
|
||||||
|
HAKMEM_TINY_TCACHE=1 HAKMEM_TINY_TCACHE_CAP=32 scripts/run_mixed_10_cleanenv.sh
|
||||||
|
HAKMEM_TINY_TCACHE=1 HAKMEM_TINY_TCACHE_CAP=64 scripts/run_mixed_10_cleanenv.sh
|
||||||
|
HAKMEM_TINY_TCACHE=1 HAKMEM_TINY_TCACHE_CAP=128 scripts/run_mixed_10_cleanenv.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. 健康診断(必須)
|
||||||
|
|
||||||
|
```sh
|
||||||
|
scripts/verify_health_profiles.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. 判定と扱い
|
||||||
|
|
||||||
|
- GO: `bench_profile` への昇格は **MIXED_TINYV3_C7_SAFE のみ**から開始(段階的)
|
||||||
|
- NEUTRAL/NO-GO: Phase 14 v2 は research box として freeze(default OFF のまま)
|
||||||
|
- Rollback:
|
||||||
|
- `export HAKMEM_TINY_TCACHE=0`
|
||||||
|
|
||||||
Reference in New Issue
Block a user