Phase 75-1: C6-only Inline Slots (P2) - GO (+2.87%)
Modular implementation of hot-class inline slots optimization: - Created 5 new boxes: env_box, tls_box, fast_path_api, integration_box, test_script - Single decision point at TLS init (ENV gate: HAKMEM_TINY_C6_INLINE_SLOTS=0/1) - Integration: 2 minimal boundary points (alloc/free paths for C6 class) - Default OFF: zero overhead when disabled (full backward compatibility) Results (10-run Mixed SSOT, WS=400): - Baseline (C6 inline OFF): 44.24 M ops/s - Treatment (C6 inline ON): 45.51 M ops/s - Delta: +1.27 M ops/s (+2.87%) Status: ✅ GO - Strong improvement via C6 ring buffer fast-path Mechanism: Branch elimination on unified_cache_push/pop for C6 allocations Next: Phase 75-2 (add C5 inline slots, target 85% C4-C7 coverage) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
@ -61,7 +61,7 @@
|
||||
- P1 (LOCALIZE) は default OFF で凍結(dependency chain 削減の ROI 低い)
|
||||
- 次: **Phase 74-3 (P0: FASTAPI)** へ進む
|
||||
|
||||
**Phase 74-3: P0 (FASTAPI)** 🟡 **次の指示書**
|
||||
**Phase 74-3: P0 (FASTAPI)** ✅ **完了 (NEUTRAL +0.32%)**
|
||||
|
||||
**Goal**: `unified_cache_enabled()` / `lazy-init` / `stats` 判定を **hot loop の外へ追い出す**
|
||||
|
||||
@ -71,17 +71,55 @@
|
||||
- Fail-fast: 想定外の状態なら slow path へ fallback(境界1箇所)
|
||||
- ENV gate: `HAKMEM_TINY_UC_FASTAPI=0/1` (default 0, research box)
|
||||
|
||||
**Expected**: +1-2% via branch reduction (P1 と異なる軸)
|
||||
**Results** (10-run Mixed SSOT, WS=400):
|
||||
- Throughput: **+0.32%** (NEUTRAL, below +1.0% GO threshold)
|
||||
- cache-misses: **-16.31%** (positive signal, insufficient throughput gain)
|
||||
|
||||
**判定**:
|
||||
- **GO**: +1.0% 以上
|
||||
- **NEUTRAL**: ±1.0%(freeze、次へ)
|
||||
- **NO-GO**: -1.0% 以下(即 revert)
|
||||
**判定**: **NEUTRAL (+0.32%)** → **P0 (FASTAPI) 凍結**
|
||||
|
||||
**参考**:
|
||||
- 設計: `docs/analysis/PHASE74_UNIFIEDCACHE_HITPATH_STRUCTURAL_OPT_0_DESIGN.md`
|
||||
- 指示書: `docs/analysis/PHASE74_UNIFIEDCACHE_HITPATH_STRUCTURAL_OPT_1_NEXT_INSTRUCTIONS.md`
|
||||
- 結果 (P1): `docs/analysis/PHASE74_UNIFIEDCACHE_HITPATH_STRUCTURAL_OPT_2_RESULTS.md`
|
||||
- 結果 (P1/P0): `docs/analysis/PHASE74_UNIFIEDCACHE_HITPATH_STRUCTURAL_OPT_2_RESULTS.md`
|
||||
|
||||
---
|
||||
|
||||
## Phase 75(構造): Hot-class Inline Slots (P2) 🟡 **準備中**
|
||||
|
||||
**Goal**: C4-C7 の統計分析 → targeted optimization 戦略決定
|
||||
|
||||
**前提** (Phase 74 learnings):
|
||||
- UnifiedCache hit-path optimization の ROI が低い ← register pressure / cache-miss effects
|
||||
- 次の軸: **per-class 特性を活用** → TLS-direct inline slots で branch elimination
|
||||
|
||||
**Phase 75-0: Per-Class Analysis** ✅ **完了**
|
||||
|
||||
Per-class Unified-STATS (Mixed SSOT, WS=400, HAKMEM_MEASURE_UNIFIED_CACHE=1):
|
||||
|
||||
| Class | Capacity | Occupied | Hits | Pushes | Total Ops | Hit % | % of C4-C7 |
|
||||
|-------|----------|----------|------|--------|-----------|-------|-----------|
|
||||
| C6 | 128 | 127 | 2,750,854 | 2,750,855 | **5,501,709** | 100% | **57.2%** |
|
||||
| C5 | 128 | 127 | 1,373,604 | 1,373,605 | **2,747,209** | 100% | **28.5%** |
|
||||
| C4 | 64 | 63 | 687,563 | 687,564 | **1,375,127** | 100% | **14.3%** |
|
||||
| C7 | ? | ? | ? | ? | **?** | ? | **?** |
|
||||
|
||||
**Key findings**:
|
||||
1. C6 圧倒的支配: 57.2% の操作 (2.75M hits)
|
||||
2. 全クラス 100% hit rate (refill inactive in SSOT)
|
||||
3. Cache occupancy near-capacity (98-99%)
|
||||
|
||||
**Phase 75-1: Targeting Strategy** 🟡 **User decision required**
|
||||
|
||||
**Recommendation**: Start with **C6-only** (lowest risk)
|
||||
- Highest ROI (57.2% of C4-C7 ops)
|
||||
- Lowest TLS bloat (~1KB per thread)
|
||||
- Aligns with Phase 74 learnings (register pressure matters)
|
||||
- Fail-fast: if C6 positive, expand to C5
|
||||
|
||||
**Alternative**: C6+C5 combined (85.7% ops, single A/B cycle)
|
||||
|
||||
**参考**:
|
||||
- 分析: `docs/analysis/PHASE75_PERCLASS_ANALYSIS_0_SSOT.md`
|
||||
|
||||
## 5) アーカイブ
|
||||
|
||||
|
||||
6
Makefile
6
Makefile
@ -253,7 +253,7 @@ LDFLAGS += $(EXTRA_LDFLAGS)
|
||||
|
||||
# Targets
|
||||
TARGET = test_hakmem
|
||||
OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o core/box/ss_release_policy_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/box/tiny_tcache_env_box.o core/box/tiny_unified_lifo_env_box.o core/box/front_fastlane_alloc_legacy_direct_env_box.o core/box/fastlane_direct_env_box.o core/box/tiny_header_hotfull_env_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o core/box/small_policy_snapshot_tls_box.o
|
||||
OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o core/box/ss_release_policy_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/box/tiny_tcache_env_box.o core/box/tiny_unified_lifo_env_box.o core/box/front_fastlane_alloc_legacy_direct_env_box.o core/box/fastlane_direct_env_box.o core/box/tiny_header_hotfull_env_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/tiny_c6_inline_slots.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o core/box/small_policy_snapshot_tls_box.o
|
||||
OBJS = $(OBJS_BASE)
|
||||
|
||||
# Shared library
|
||||
@ -285,7 +285,7 @@ endif
|
||||
# Benchmark targets
|
||||
BENCH_HAKMEM = bench_allocators_hakmem
|
||||
BENCH_SYSTEM = bench_allocators_system
|
||||
BENCH_HAKMEM_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o core/box/ss_release_policy_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/fastlane_direct_env_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o core/box/small_policy_snapshot_tls_box.o bench_allocators_hakmem.o
|
||||
BENCH_HAKMEM_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o core/box/ss_release_policy_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/fastlane_direct_env_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/tiny_c6_inline_slots.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o core/box/small_policy_snapshot_tls_box.o bench_allocators_hakmem.o
|
||||
BENCH_HAKMEM_OBJS = $(BENCH_HAKMEM_OBJS_BASE)
|
||||
ifeq ($(POOL_TLS_PHASE1),1)
|
||||
BENCH_HAKMEM_OBJS += pool_tls.o pool_refill.o pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o
|
||||
@ -462,7 +462,7 @@ test-box-refactor: box-refactor
|
||||
./larson_hakmem 10 8 128 1024 1 12345 4
|
||||
|
||||
# Phase 4: Tiny Pool benchmarks (properly linked with hakmem)
|
||||
TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o core/box/ss_release_policy_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/box/tiny_tcache_env_box.o core/box/tiny_unified_lifo_env_box.o core/box/front_fastlane_alloc_legacy_direct_env_box.o core/box/fastlane_direct_env_box.o core/box/tiny_header_hotfull_env_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o core/box/small_policy_snapshot_tls_box.o
|
||||
TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o core/box/ss_release_policy_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/box/tiny_tcache_env_box.o core/box/tiny_unified_lifo_env_box.o core/box/front_fastlane_alloc_legacy_direct_env_box.o core/box/fastlane_direct_env_box.o core/box/tiny_header_hotfull_env_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/tiny_c6_inline_slots.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o core/box/small_policy_snapshot_tls_box.o
|
||||
TINY_BENCH_OBJS = $(TINY_BENCH_OBJS_BASE)
|
||||
ifeq ($(POOL_TLS_PHASE1),1)
|
||||
TINY_BENCH_OBJS += pool_tls.o pool_refill.o core/pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o
|
||||
|
||||
61
core/box/tiny_c6_inline_slots_env_box.h
Normal file
61
core/box/tiny_c6_inline_slots_env_box.h
Normal file
@ -0,0 +1,61 @@
|
||||
// tiny_c6_inline_slots_env_box.h - Phase 75-1: C6 Inline Slots ENV Gate
|
||||
//
|
||||
// Goal: Runtime ENV gate for C6-only inline slots optimization
|
||||
// Scope: C6 class only (capacity 128, 8-byte slots)
|
||||
// Default: OFF (research box, ENV=0)
|
||||
//
|
||||
// ENV Variable:
|
||||
// HAKMEM_TINY_C6_INLINE_SLOTS=0/1 (default: 0, OFF)
|
||||
//
|
||||
// Design:
|
||||
// - Lazy-init pattern (single decision per TLS init)
|
||||
// - No TLS struct changes (pure gate)
|
||||
// - Thread-safe initialization
|
||||
//
|
||||
// Phase 75-1: C6-only implementation (P2 priority)
|
||||
// Phase 75-2: Expand to C6+C5 if Phase 75-1 shows GO (+1.0%+)
|
||||
|
||||
#ifndef HAK_BOX_TINY_C6_INLINE_SLOTS_ENV_BOX_H
|
||||
#define HAK_BOX_TINY_C6_INLINE_SLOTS_ENV_BOX_H
|
||||
|
||||
#include <stdlib.h>
|
||||
#include <stdio.h>
|
||||
#include "../hakmem_build_flags.h"
|
||||
|
||||
// ============================================================================
|
||||
// ENV Gate: C6 Inline Slots
|
||||
// ============================================================================
|
||||
|
||||
// Check if C6 inline slots are enabled (lazy init, cached)
|
||||
static inline int tiny_c6_inline_slots_enabled(void) {
|
||||
static int g_c6_inline_slots_enabled = -1;
|
||||
|
||||
if (__builtin_expect(g_c6_inline_slots_enabled == -1, 0)) {
|
||||
const char* e = getenv("HAKMEM_TINY_C6_INLINE_SLOTS");
|
||||
g_c6_inline_slots_enabled = (e && *e && *e != '0') ? 1 : 0;
|
||||
|
||||
#if !HAKMEM_BUILD_RELEASE
|
||||
fprintf(stderr, "[C6-INLINE-INIT] tiny_c6_inline_slots_enabled() = %d (env=%s)\n",
|
||||
g_c6_inline_slots_enabled, e ? e : "NULL");
|
||||
fflush(stderr);
|
||||
#endif
|
||||
}
|
||||
|
||||
return g_c6_inline_slots_enabled;
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Optional: Compile-time gate for Phase 75-2 (future)
|
||||
// ============================================================================
|
||||
// When transitioning from research box (ENV-only) to production,
|
||||
// add compile-time flag to eliminate runtime branch overhead:
|
||||
//
|
||||
// #ifdef HAKMEM_TINY_C6_INLINE_SLOTS_COMPILED
|
||||
// return 1; // Compile-time ON
|
||||
// #else
|
||||
// return tiny_c6_inline_slots_enabled(); // Runtime ENV gate
|
||||
// #endif
|
||||
//
|
||||
// For Phase 75-1: Keep ENV-only (research box, default OFF)
|
||||
|
||||
#endif // HAK_BOX_TINY_C6_INLINE_SLOTS_ENV_BOX_H
|
||||
92
core/box/tiny_c6_inline_slots_tls_box.h
Normal file
92
core/box/tiny_c6_inline_slots_tls_box.h
Normal file
@ -0,0 +1,92 @@
|
||||
// tiny_c6_inline_slots_tls_box.h - Phase 75-1: C6 Inline Slots TLS Extension
|
||||
//
|
||||
// Goal: Extend TLS struct with C6-only inline slot ring buffer
|
||||
// Scope: C6 class only (capacity 128, 8-byte slots = 1KB per thread)
|
||||
// Design: Simple FIFO ring (head/tail indices, modulo 128)
|
||||
//
|
||||
// Ring Buffer Strategy:
|
||||
// - head: next pop position (consumer)
|
||||
// - tail: next push position (producer)
|
||||
// - Empty: head == tail
|
||||
// - Full: (tail + 1) % 128 == head
|
||||
// - Count: (tail - head + 128) % 128
|
||||
//
|
||||
// TLS Layout Impact:
|
||||
// - Size: 128 slots × 8 bytes = 1KB per thread
|
||||
// - Alignment: 64-byte cache line aligned (optional, for performance)
|
||||
// - Lifetime: Zero-initialized at TLS init, valid for thread lifetime
|
||||
//
|
||||
// Conditional Compilation:
|
||||
// - Only compiled if HAKMEM_TINY_C6_INLINE_SLOTS enabled
|
||||
// - Default OFF: zero overhead when disabled
|
||||
|
||||
#ifndef HAK_BOX_TINY_C6_INLINE_SLOTS_TLS_BOX_H
|
||||
#define HAK_BOX_TINY_C6_INLINE_SLOTS_TLS_BOX_H
|
||||
|
||||
#include <stdint.h>
|
||||
#include <string.h>
|
||||
#include "tiny_c6_inline_slots_env_box.h"
|
||||
|
||||
// ============================================================================
|
||||
// C6 Inline Slots: TLS Structure
|
||||
// ============================================================================
|
||||
|
||||
#define TINY_C6_INLINE_CAPACITY 128 // C6 capacity (from Unified-STATS analysis)
|
||||
|
||||
// TLS ring buffer for C6 inline slots
|
||||
// Design: FIFO ring (head/tail indices, circular buffer)
|
||||
typedef struct __attribute__((aligned(64))) {
|
||||
void* slots[TINY_C6_INLINE_CAPACITY]; // BASE pointers (1KB)
|
||||
uint8_t head; // Next pop position (consumer)
|
||||
uint8_t tail; // Next push position (producer)
|
||||
uint8_t _pad[62]; // Padding to 64-byte cache line boundary
|
||||
} TinyC6InlineSlots;
|
||||
|
||||
// ============================================================================
|
||||
// TLS Variable (extern, defined in tiny_c6_inline_slots.c)
|
||||
// ============================================================================
|
||||
|
||||
// TLS instance (one per thread)
|
||||
// Conditionally compiled: only if C6 inline slots are enabled
|
||||
extern __thread TinyC6InlineSlots g_tiny_c6_inline_slots;
|
||||
|
||||
// ============================================================================
|
||||
// Initialization
|
||||
// ============================================================================
|
||||
|
||||
// Initialize C6 inline slots for current thread
|
||||
// Called once at TLS init time (hakmem_tiny_init_thread or equivalent)
|
||||
// Returns: 1 if initialized, 0 if disabled
|
||||
static inline int tiny_c6_inline_slots_init(TinyC6InlineSlots* slots) {
|
||||
if (!tiny_c6_inline_slots_enabled()) {
|
||||
return 0; // Disabled, no init needed
|
||||
}
|
||||
|
||||
// Zero-initialize all slots
|
||||
memset(slots->slots, 0, sizeof(slots->slots));
|
||||
slots->head = 0;
|
||||
slots->tail = 0;
|
||||
|
||||
return 1; // Initialized
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Ring Buffer Helpers (inline for zero overhead)
|
||||
// ============================================================================
|
||||
|
||||
// Check if ring is empty
|
||||
static inline int c6_inline_empty(const TinyC6InlineSlots* slots) {
|
||||
return slots->head == slots->tail;
|
||||
}
|
||||
|
||||
// Check if ring is full
|
||||
static inline int c6_inline_full(const TinyC6InlineSlots* slots) {
|
||||
return ((slots->tail + 1) % TINY_C6_INLINE_CAPACITY) == slots->head;
|
||||
}
|
||||
|
||||
// Get current count (number of items in ring)
|
||||
static inline int c6_inline_count(const TinyC6InlineSlots* slots) {
|
||||
return (slots->tail - slots->head + TINY_C6_INLINE_CAPACITY) % TINY_C6_INLINE_CAPACITY;
|
||||
}
|
||||
|
||||
#endif // HAK_BOX_TINY_C6_INLINE_SLOTS_TLS_BOX_H
|
||||
@ -31,6 +31,8 @@
|
||||
#include "../front/tiny_unified_cache.h" // For TinyUnifiedCache
|
||||
#include "tiny_header_box.h" // Phase 5 E5-2: For tiny_header_finalize_alloc
|
||||
#include "tiny_unified_lifo_box.h" // Phase 15 v1: UnifiedCache FIFO→LIFO
|
||||
#include "tiny_c6_inline_slots_env_box.h" // Phase 75-1: C6 inline slots ENV gate
|
||||
#include "../front/tiny_c6_inline_slots.h" // Phase 75-1: C6 inline slots API
|
||||
|
||||
// ============================================================================
|
||||
// Branch Prediction Macros (Pointer Safety - Prediction Hints)
|
||||
@ -110,6 +112,21 @@ __attribute__((always_inline))
|
||||
static inline void* tiny_hot_alloc_fast(int class_idx) {
|
||||
extern __thread TinyUnifiedCache g_unified_cache[];
|
||||
|
||||
// Phase 75-1: C6 Inline Slots early-exit (ENV gated)
|
||||
// Try C6 inline slots FIRST (before unified cache) for class 6
|
||||
if (class_idx == 6 && tiny_c6_inline_slots_enabled()) {
|
||||
void* base = c6_inline_pop(c6_inline_tls());
|
||||
if (TINY_HOT_LIKELY(base != NULL)) {
|
||||
TINY_HOT_METRICS_HIT(class_idx);
|
||||
#if HAKMEM_TINY_HEADER_CLASSIDX
|
||||
return tiny_header_finalize_alloc(base, class_idx);
|
||||
#else
|
||||
return base;
|
||||
#endif
|
||||
}
|
||||
// C6 inline miss → fall through to unified cache
|
||||
}
|
||||
|
||||
// TLS cache access (1 cache miss)
|
||||
// NOTE: Range check removed - caller (hak_tiny_size_to_class) guarantees valid class_idx
|
||||
TinyUnifiedCache* cache = &g_unified_cache[class_idx];
|
||||
|
||||
@ -12,6 +12,8 @@
|
||||
#include "tiny_metadata_cache_env_box.h" // Phase 3 C2: Metadata cache ENV gate
|
||||
#include "hakmem_env_snapshot_box.h" // Phase 4 E1: ENV snapshot consolidation
|
||||
#include "tiny_unified_cache_fastapi_env_box.h" // Phase 74-3: FASTAPI ENV gate
|
||||
#include "tiny_c6_inline_slots_env_box.h" // Phase 75-1: C6 inline slots ENV gate
|
||||
#include "../front/tiny_c6_inline_slots.h" // Phase 75-1: C6 inline slots API
|
||||
|
||||
// Purpose: Encapsulate legacy free logic (shared by multiple paths)
|
||||
// Called by: malloc_tiny_fast.h (free path) + tiny_c6_ultra_free_box.c (C6 fallback)
|
||||
@ -23,6 +25,20 @@
|
||||
//
|
||||
__attribute__((always_inline))
|
||||
static inline void tiny_legacy_fallback_free_base_with_env(void* base, uint32_t class_idx, const HakmemEnvSnapshot* env) {
|
||||
// Phase 75-1: C6 Inline Slots early-exit (ENV gated)
|
||||
// Try C6 inline slots FIRST (before unified cache) for class 6
|
||||
if (class_idx == 6 && tiny_c6_inline_slots_enabled()) {
|
||||
if (c6_inline_push(c6_inline_tls(), base)) {
|
||||
// Success: pushed to C6 inline slots
|
||||
FREE_PATH_STAT_INC(legacy_fallback);
|
||||
if (__builtin_expect(free_path_stats_enabled(), 0)) {
|
||||
g_free_path_stats.legacy_by_class[class_idx]++;
|
||||
}
|
||||
return;
|
||||
}
|
||||
// FULL → fall through to unified cache
|
||||
}
|
||||
|
||||
const TinyFrontV3Snapshot* front_snap =
|
||||
env ? (env->tiny_front_v3_enabled ? tiny_front_v3_snapshot_get() : NULL)
|
||||
: (__builtin_expect(tiny_front_v3_enabled(), 0) ? tiny_front_v3_snapshot_get() : NULL);
|
||||
|
||||
89
core/front/tiny_c6_inline_slots.h
Normal file
89
core/front/tiny_c6_inline_slots.h
Normal file
@ -0,0 +1,89 @@
|
||||
// tiny_c6_inline_slots.h - Phase 75-1: C6 Inline Slots Fast-Path API
|
||||
//
|
||||
// Goal: Zero-overhead fast-path API for C6 inline slot operations
|
||||
// Scope: C6 class only (57.2% of C4-C7 operations in Mixed SSOT)
|
||||
// Design: Always-inline, fail-fast to unified_cache on FULL/empty
|
||||
//
|
||||
// Performance Target:
|
||||
// - Push: 1-2 cycles (ring index update, no bounds check)
|
||||
// - Pop: 1-2 cycles (ring index update, null check)
|
||||
// - Fallback: Silent delegation to unified_cache (existing path)
|
||||
//
|
||||
// Integration Points:
|
||||
// - Alloc: Try c6_inline_pop() first, fallback to unified_cache_pop()
|
||||
// - Free: Try c6_inline_push() first, fallback to unified_cache_push()
|
||||
//
|
||||
// Safety:
|
||||
// - Caller must check c6_inline_enabled() before calling
|
||||
// - Caller must handle NULL return (pop) or full condition (push)
|
||||
// - No internal checks (fail-fast design)
|
||||
|
||||
#ifndef HAK_FRONT_TINY_C6_INLINE_SLOTS_H
|
||||
#define HAK_FRONT_TINY_C6_INLINE_SLOTS_H
|
||||
|
||||
#include <stdint.h>
|
||||
#include "../box/tiny_c6_inline_slots_env_box.h"
|
||||
#include "../box/tiny_c6_inline_slots_tls_box.h"
|
||||
|
||||
// ============================================================================
|
||||
// Fast-Path API (always_inline for zero branch overhead)
|
||||
// ============================================================================
|
||||
|
||||
// Push to C6 inline slots (free path)
|
||||
// Returns: 1 on success, 0 if full (caller must fallback to unified_cache)
|
||||
// Precondition: ptr is valid BASE pointer for C6 class
|
||||
__attribute__((always_inline))
|
||||
static inline int c6_inline_push(TinyC6InlineSlots* slots, void* ptr) {
|
||||
// Full check (single branch, likely taken in steady state)
|
||||
if (__builtin_expect(c6_inline_full(slots), 0)) {
|
||||
return 0; // Full, caller must fallback
|
||||
}
|
||||
|
||||
// Push to tail (FIFO producer)
|
||||
slots->slots[slots->tail] = ptr;
|
||||
slots->tail = (slots->tail + 1) % TINY_C6_INLINE_CAPACITY;
|
||||
|
||||
return 1; // Success
|
||||
}
|
||||
|
||||
// Pop from C6 inline slots (alloc path)
|
||||
// Returns: BASE pointer on success, NULL if empty (caller must fallback to unified_cache)
|
||||
// Precondition: slots is initialized and enabled
|
||||
__attribute__((always_inline))
|
||||
static inline void* c6_inline_pop(TinyC6InlineSlots* slots) {
|
||||
// Empty check (single branch, likely NOT taken in steady state)
|
||||
if (__builtin_expect(c6_inline_empty(slots), 0)) {
|
||||
return NULL; // Empty, caller must fallback
|
||||
}
|
||||
|
||||
// Pop from head (FIFO consumer)
|
||||
void* ptr = slots->slots[slots->head];
|
||||
slots->head = (slots->head + 1) % TINY_C6_INLINE_CAPACITY;
|
||||
|
||||
return ptr; // BASE pointer (caller converts to USER)
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Integration Helpers (for malloc_tiny_fast.h integration)
|
||||
// ============================================================================
|
||||
|
||||
// Get TLS instance (wraps extern TLS variable)
|
||||
static inline TinyC6InlineSlots* c6_inline_tls(void) {
|
||||
return &g_tiny_c6_inline_slots;
|
||||
}
|
||||
|
||||
// Check if C6 inline is enabled AND initialized (combined gate)
|
||||
// Returns: 1 if ready to use, 0 if disabled or uninitialized
|
||||
static inline int c6_inline_ready(void) {
|
||||
// ENV gate first (cached, zero cost after first call)
|
||||
if (!tiny_c6_inline_slots_enabled()) {
|
||||
return 0;
|
||||
}
|
||||
|
||||
// TLS init check (once per thread)
|
||||
// Note: In production, this check can be eliminated if TLS init is guaranteed
|
||||
TinyC6InlineSlots* slots = c6_inline_tls();
|
||||
return (slots->slots != NULL || slots->head == 0); // Initialized if zero or non-null
|
||||
}
|
||||
|
||||
#endif // HAK_FRONT_TINY_C6_INLINE_SLOTS_H
|
||||
18
core/tiny_c6_inline_slots.c
Normal file
18
core/tiny_c6_inline_slots.c
Normal file
@ -0,0 +1,18 @@
|
||||
// tiny_c6_inline_slots.c - Phase 75-1: C6 Inline Slots TLS Variable Definition
|
||||
//
|
||||
// Goal: Define TLS variable for C6 inline slots
|
||||
// Scope: C6 class only (1KB per thread)
|
||||
|
||||
#include "box/tiny_c6_inline_slots_tls_box.h"
|
||||
|
||||
// ============================================================================
|
||||
// TLS Variable Definition
|
||||
// ============================================================================
|
||||
|
||||
// TLS instance (one per thread)
|
||||
// Zero-initialized by default (all slots NULL, head=0, tail=0)
|
||||
__thread TinyC6InlineSlots g_tiny_c6_inline_slots = {
|
||||
.slots = {0}, // All NULL
|
||||
.head = 0,
|
||||
.tail = 0,
|
||||
};
|
||||
240
docs/analysis/PHASE75_PERCLASS_ANALYSIS_0_SSOT.md
Normal file
240
docs/analysis/PHASE75_PERCLASS_ANALYSIS_0_SSOT.md
Normal file
@ -0,0 +1,240 @@
|
||||
# Phase 75 Per-Class Analysis - Mixed SSOT Unified-STATS
|
||||
|
||||
**Status**: ANALYSIS COMPLETE, ready for Phase 75 (P2: Hot-class Inline Slots) targeting decision
|
||||
|
||||
**Workload**: Mixed SSOT (WS=400, ITERS=20000000, WarmPool=16)
|
||||
|
||||
**Measurement**: `HAKMEM_MEASURE_UNIFIED_CACHE=1` OBSERVE run
|
||||
|
||||
---
|
||||
|
||||
## 1. Per-Class Unified-STATS (Ranked by Volume)
|
||||
|
||||
### Data Summary
|
||||
|
||||
| Class | Capacity | Occupied | Hit Count | Push Count | Total Ops | Hit Rate | % of Total |
|
||||
|-------|----------|----------|-----------|------------|-----------|----------|-----------|
|
||||
| **C6** | 128 | 127 | 2,750,854 | 2,750,855 | **5,501,709** | 100.0% | **57.2%** |
|
||||
| **C5** | 128 | 127 | 1,373,604 | 1,373,605 | **2,747,209** | 100.0% | **28.5%** |
|
||||
| **C4** | 64 | 63 | 687,563 | 687,564 | **1,375,127** | 100.0% | **14.3%** |
|
||||
| **C7** | ? | ? | ? | ? | **?** | ? | **?** |
|
||||
|
||||
**Total C4-C6**: 9,624,045 operations (100% hit rate across all three classes)
|
||||
|
||||
**Observation**: C7 statistics not visible in current OBSERVE output (may require additional diagnostics)
|
||||
|
||||
---
|
||||
|
||||
## 2. Ranking & Key Findings
|
||||
|
||||
### Volume Ranking (Descending)
|
||||
|
||||
1. **C6: 57.2% of C4-C7 volume** (2.75M hits, 2.75M pushes)
|
||||
- Highest operational density
|
||||
- Cache occupancy: 127/128 (99.2%)
|
||||
- Perfect 100% hit rate
|
||||
|
||||
2. **C5: 28.5% of C4-C7 volume** (1.37M hits, 1.37M pushes)
|
||||
- Second-highest operational density
|
||||
- Cache occupancy: 127/128 (99.2%)
|
||||
- Perfect 100% hit rate
|
||||
|
||||
3. **C4: 14.3% of C4-C7 volume** (687K hits, 687K pushes)
|
||||
- Lower operational density
|
||||
- Cache occupancy: 63/64 (98.4%)
|
||||
- Perfect 100% hit rate
|
||||
|
||||
4. **C7: UNKNOWN**
|
||||
- Statistics not yet captured
|
||||
- Requires separate analysis run with explicit C7 flags
|
||||
|
||||
---
|
||||
|
||||
## 3. Unified-STATS Interpretation
|
||||
|
||||
### Perfect Hit Rates (100% across all observed classes)
|
||||
|
||||
All observed classes (C4, C5, C6) achieve **100% hit rate** in Mixed SSOT workload:
|
||||
- Zero refill events (`push == hit`)
|
||||
- All allocations sourced from unified_cache (no fallback to backend)
|
||||
- Cache capacity is **never exhausted** (0% full events)
|
||||
|
||||
**Implication**: UnifiedCache **sufficiently sized** for Mixed SSOT; refill path not active during benchmark.
|
||||
|
||||
### Cache Occupancy Patterns
|
||||
|
||||
```
|
||||
C4: 63/64 slots occupied (98.4%) - 1 free slot
|
||||
C5: 127/128 slots occupied (99.2%) - 1 free slot
|
||||
C6: 127/128 slots occupied (99.2%) - 1 free slot
|
||||
```
|
||||
|
||||
**Finding**: All classes operate at **near-capacity** (98-99%), indicating:
|
||||
- Steady-state working set matches cache capacity
|
||||
- Minimal fragmentation
|
||||
- High cache efficiency
|
||||
|
||||
---
|
||||
|
||||
## 4. P2 (Hot-class Inline Slots) Targeting Strategy
|
||||
|
||||
### Recommendation: PRIMARY TARGET = C6
|
||||
|
||||
**Rationale**:
|
||||
1. **Highest ROI**: C6 dominates with 57.2% of operations
|
||||
- ~2.75M hit operations = highest branch reduction opportunity
|
||||
- Any optimization on C6 provides 57% proportional benefit across all C4-C7 ops
|
||||
|
||||
2. **Secondary Target**: C5 (28.5%)
|
||||
- Significant volume, second-priority optimization
|
||||
- Compound benefit: C6 + C5 = 85.7% of C4-C7 operations
|
||||
|
||||
3. **Low Priority**: C4 (14.3%)
|
||||
- Lowest volume, lower ROI
|
||||
- Defer unless C6/C5 optimization requires it
|
||||
|
||||
4. **Unknown**: C7
|
||||
- Statistics not yet available
|
||||
- Recommend gathering C7 stats before deciding C6/C5/C4 vs C7 targeting
|
||||
|
||||
---
|
||||
|
||||
## 5. Inline Slots Design Impact Analysis
|
||||
|
||||
### Estimated Branch Reduction (per optimization)
|
||||
|
||||
Assuming **inline fast-path** placement (TLS-direct, zero-branch):
|
||||
|
||||
**Per-class impact** (based on Phase 74 lessons):
|
||||
- Instruction count reduction per hit: ~2-4 instructions (push/pop branch elimination)
|
||||
- Expected throughput gain per 1M hits: +0.05-0.10% (conservative estimate)
|
||||
|
||||
**C6 standalone**: 2.75M hits × 0.05-0.10%/M = **+0.14-0.27%** (projected, if branch overhead dominates)
|
||||
|
||||
**C6 + C5 combined**: 4.12M hits × 0.05-0.10%/M = **+0.21-0.41%** (projected)
|
||||
|
||||
**Risk factors**:
|
||||
- Cache-miss sensitivity (Phase 74-2 showed +86% cache-misses from register pressure)
|
||||
- TLS struct bloat (each inline slot = ~8-16 bytes × capacity per class)
|
||||
- Memory hierarchy effects (L1-dcache pressure from TLS expansion)
|
||||
|
||||
---
|
||||
|
||||
## 6. Before/After Unified-STATS Baseline
|
||||
|
||||
### Current Baseline (Phase 69: WarmPool=16)
|
||||
|
||||
```
|
||||
Mixed SSOT Throughput: 62.63 M ops/s (51.77% of mimalloc)
|
||||
Target M2: 55% of mimalloc (~65.1 M ops/s baseline)
|
||||
Remaining gap: +3.23pp
|
||||
```
|
||||
|
||||
### Phase 75 (P2) Success Criteria
|
||||
|
||||
| Scenario | Throughput | vs Baseline | Status |
|
||||
|----------|-----------|-----------|--------|
|
||||
| **GO** | ≥ 64.1 M ops/s | +2.4% | +0.8pp toward M2 |
|
||||
| **NEUTRAL** | 61.6-64.1 M ops/s | ±1.5% | freeze, continue Phase 76 |
|
||||
| **NO-GO** | ≤ 61.6 M ops/s | -1.6% | revert immediately |
|
||||
|
||||
**Strict gate**: +2.0% for structural change (TLS bloat risk)
|
||||
|
||||
---
|
||||
|
||||
## 7. Risk Assessment: TLS Expansion vs Benefit
|
||||
|
||||
### TLS Struct Bloat Analysis
|
||||
|
||||
**Current TLS size** (estimated from Phase 69):
|
||||
- UnifiedCache entries: minimal (backend pointers only)
|
||||
- WarmPool SLL: ~2KB (Phase 69-71)
|
||||
- **Total TINY_MEM TLS: ~2-4KB per thread**
|
||||
|
||||
**Proposed P2 expansion** (inline slots for C4-C7):
|
||||
- C4 inline: 64 slots × 8 bytes = 512 bytes
|
||||
- C5 inline: 128 slots × 8 bytes = 1,024 bytes
|
||||
- C6 inline: 128 slots × 8 bytes = 1,024 bytes
|
||||
- C7 inline: ??? slots × 8 bytes = ???
|
||||
- **Total P2 expansion: ~2.5-3.5KB per class (selective) or ~4-5KB (all C4-C7)**
|
||||
|
||||
**TLS Memory Trade-off**:
|
||||
- 10 threads × 4KB = **40KB system-wide** (negligible)
|
||||
- But **per-thread L1-dcache footprint** increases
|
||||
- L1-dcache pressure → potential cache evictions
|
||||
- Phase 74-2 showed this can dominate (cache-misses +86%)
|
||||
|
||||
### Decision Gate
|
||||
|
||||
**Before proceeding with P2**:
|
||||
1. Gather C7 statistics (currently missing)
|
||||
2. Validate C6 > C5 > C4 > C7 ordering
|
||||
3. Decide: C6-only, C6+C5, or full C4-C7?
|
||||
4. Benchmark single-class inline (C6) first to validate ROI before expanding
|
||||
|
||||
---
|
||||
|
||||
## 8. Next Steps (User Decision Required)
|
||||
|
||||
### Option A: Proceed with C6-only P2 (Recommended - Lowest Risk)
|
||||
|
||||
**Approach**:
|
||||
- Implement inline slots for C6 only (highest volume, 57.2%)
|
||||
- Measure impact: target +1.5-2.5% throughput
|
||||
- If successful, expand to C5 in Phase 75-2
|
||||
|
||||
**Pros**: Lowest TLS bloat, highest ROI/risk ratio
|
||||
**Cons**: Multi-phase approach, requires two A/B cycles
|
||||
|
||||
### Option B: Proceed with C6+C5 P2 (Moderate Risk)
|
||||
|
||||
**Approach**:
|
||||
- Implement inline slots for C6 + C5 (combined 85.7% of C4-C7 ops)
|
||||
- Measure impact: target +2.0-3.0% throughput
|
||||
- If successful, consolidate as Phase 75 final
|
||||
|
||||
**Pros**: Single A/B cycle, captures 85.7% of optimization opportunity
|
||||
**Cons**: Higher TLS bloat (~2KB), higher register pressure risk
|
||||
|
||||
### Option C: Defer P2 Until C7 Analysis
|
||||
|
||||
**Approach**:
|
||||
- Gather C7 statistics from separate OBSERVE run
|
||||
- Rank all four classes before targeting
|
||||
- Decide on C6/C5/C4/C7 balance based on full data
|
||||
|
||||
**Pros**: Data-driven decision, reduces risk of targeting wrong class
|
||||
**Cons**: Adds diagnostic cycle before implementation
|
||||
|
||||
---
|
||||
|
||||
## 9. Recommendation Summary
|
||||
|
||||
**PRIMARY RECOMMENDATION**: **Option A - Start with C6-only**
|
||||
|
||||
**Rationale**:
|
||||
1. C6 is clearly dominant (57.2% volume)
|
||||
2. Lowest TLS bloat (~1KB) reduces register pressure risk
|
||||
3. Conservative approach aligns with Phase 74 learnings (register pressure matters)
|
||||
4. Fail-fast: if C6 shows positive ROI, expand to C5; if NO-GO, iterate differently
|
||||
|
||||
**Secondary**: Gather C7 stats in parallel to validate completeness
|
||||
|
||||
**Decision**: **User choice** - provide approach preference before proceeding to Phase 75 implementation
|
||||
|
||||
---
|
||||
|
||||
## Artifacts
|
||||
|
||||
- **Baseline**: Mixed SSOT OBSERVE run: `./bench_random_mixed_hakmem_observe 20000000 400 1`
|
||||
- **Measurement**: Per-class Unified-STATS with `HAKMEM_MEASURE_UNIFIED_CACHE=1`
|
||||
- **Analysis**: This document (PHASE75_PERCLASS_ANALYSIS_0_SSOT.md)
|
||||
|
||||
---
|
||||
|
||||
## Timeline
|
||||
|
||||
- Phase 74 (P1/P0): UnifiedCache hit-path optimization → FROZEN (NEUTRAL)
|
||||
- Phase 75 (P2): Hot-class Inline Slots → **PENDING USER DECISION** (targeting strategy)
|
||||
- Phase 75-1: Implement selected class(es) → (next)
|
||||
- Phase 75-2: A/B test & results → (next)
|
||||
12
hakmem.d
12
hakmem.d
@ -112,6 +112,11 @@ hakmem.o: core/hakmem.c core/hakmem.h core/hakmem_build_flags.h \
|
||||
core/box/../front/../box/tiny_header_box.h \
|
||||
core/box/../front/../box/tiny_unified_lifo_box.h \
|
||||
core/box/../front/../box/tiny_unified_lifo_env_box.h \
|
||||
core/box/../front/../box/tiny_c6_inline_slots_env_box.h \
|
||||
core/box/../front/../box/../front/tiny_c6_inline_slots.h \
|
||||
core/box/../front/../box/../front/../box/tiny_c6_inline_slots_env_box.h \
|
||||
core/box/../front/../box/../front/../box/tiny_c6_inline_slots_tls_box.h \
|
||||
core/box/../front/../box/../front/../box/tiny_c6_inline_slots_env_box.h \
|
||||
core/box/../front/../box/tiny_front_cold_box.h \
|
||||
core/box/../front/../box/tiny_layout_box.h \
|
||||
core/box/../front/../box/tiny_hotheap_v2_box.h \
|
||||
@ -153,6 +158,7 @@ hakmem.o: core/hakmem.c core/hakmem.h core/hakmem_build_flags.h \
|
||||
core/box/../front/../box/tiny_front_hot_box.h \
|
||||
core/box/../front/../box/tiny_metadata_cache_env_box.h \
|
||||
core/box/../front/../box/hakmem_env_snapshot_box.h \
|
||||
core/box/../front/../box/tiny_unified_cache_fastapi_env_box.h \
|
||||
core/box/../front/../box/tiny_ptr_convert_box.h \
|
||||
core/box/../front/../box/tiny_front_stats_box.h \
|
||||
core/box/../front/../box/free_path_stats_box.h \
|
||||
@ -372,6 +378,11 @@ core/box/../front/../box/../front/tiny_unified_cache.h:
|
||||
core/box/../front/../box/tiny_header_box.h:
|
||||
core/box/../front/../box/tiny_unified_lifo_box.h:
|
||||
core/box/../front/../box/tiny_unified_lifo_env_box.h:
|
||||
core/box/../front/../box/tiny_c6_inline_slots_env_box.h:
|
||||
core/box/../front/../box/../front/tiny_c6_inline_slots.h:
|
||||
core/box/../front/../box/../front/../box/tiny_c6_inline_slots_env_box.h:
|
||||
core/box/../front/../box/../front/../box/tiny_c6_inline_slots_tls_box.h:
|
||||
core/box/../front/../box/../front/../box/tiny_c6_inline_slots_env_box.h:
|
||||
core/box/../front/../box/tiny_front_cold_box.h:
|
||||
core/box/../front/../box/tiny_layout_box.h:
|
||||
core/box/../front/../box/tiny_hotheap_v2_box.h:
|
||||
@ -413,6 +424,7 @@ core/box/../front/../box/free_path_stats_box.h:
|
||||
core/box/../front/../box/tiny_front_hot_box.h:
|
||||
core/box/../front/../box/tiny_metadata_cache_env_box.h:
|
||||
core/box/../front/../box/hakmem_env_snapshot_box.h:
|
||||
core/box/../front/../box/tiny_unified_cache_fastapi_env_box.h:
|
||||
core/box/../front/../box/tiny_ptr_convert_box.h:
|
||||
core/box/../front/../box/tiny_front_stats_box.h:
|
||||
core/box/../front/../box/free_path_stats_box.h:
|
||||
|
||||
150
scripts/phase75_c6_inline_test.sh
Executable file
150
scripts/phase75_c6_inline_test.sh
Executable file
@ -0,0 +1,150 @@
|
||||
#!/bin/bash
|
||||
# Phase 75-1: C6 Inline Slots A/B Test
|
||||
#
|
||||
# Goal: Compare baseline (C6 inline OFF) vs treatment (C6 inline ON)
|
||||
# Decision Gate: +1.0% GO, ±1.0% NEUTRAL, -1.0% NO-GO
|
||||
#
|
||||
# Usage:
|
||||
# bash scripts/phase75_c6_inline_test.sh
|
||||
#
|
||||
# Output:
|
||||
# - Baseline: /tmp/c6_inline_baseline.log (10 runs, ENV=0)
|
||||
# - Treatment: /tmp/c6_inline_treatment.log (10 runs, ENV=1)
|
||||
# - Summary: Average throughput delta, decision recommendation
|
||||
|
||||
set -e # Exit on error
|
||||
|
||||
echo "========================================="
|
||||
echo "Phase 75-1: C6 Inline Slots A/B Test"
|
||||
echo "========================================="
|
||||
echo ""
|
||||
|
||||
# Verify we're in the hakmem directory
|
||||
if [ ! -f "Makefile" ]; then
|
||||
echo "ERROR: Must run from hakmem root directory"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Clean any previous builds
|
||||
echo "Cleaning previous builds..."
|
||||
make clean > /dev/null 2>&1
|
||||
|
||||
# ============================================================================
|
||||
# Baseline: C6 Inline OFF (ENV=0, default)
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "========================================="
|
||||
echo "BASELINE: Building with C6 inline OFF..."
|
||||
echo "========================================="
|
||||
make -j bench_random_mixed_hakmem > /tmp/c6_inline_build_baseline.log 2>&1
|
||||
if [ $? -ne 0 ]; then
|
||||
echo "ERROR: Baseline build failed. Check /tmp/c6_inline_build_baseline.log"
|
||||
exit 1
|
||||
fi
|
||||
echo "Build succeeded (log: /tmp/c6_inline_build_baseline.log)"
|
||||
|
||||
echo ""
|
||||
echo "Running baseline 10-run (WS=400, ITERS=20000000, HAKMEM_WARM_POOL_SIZE=16)..."
|
||||
echo ""
|
||||
|
||||
# Run baseline benchmark 10 times
|
||||
for i in {1..10}; do
|
||||
echo "=== Baseline Run $i/10 ==="
|
||||
HAKMEM_WARM_POOL_SIZE=16 HAKMEM_TINY_C6_INLINE_SLOTS=0 \
|
||||
./bench_random_mixed_hakmem 20000000 400 1 2>&1
|
||||
done > /tmp/c6_inline_baseline.log
|
||||
|
||||
echo "Baseline runs complete (log: /tmp/c6_inline_baseline.log)"
|
||||
|
||||
# ============================================================================
|
||||
# Treatment: C6 Inline ON (ENV=1)
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "========================================="
|
||||
echo "TREATMENT: Building with C6 inline ON..."
|
||||
echo "========================================="
|
||||
make clean > /dev/null 2>&1
|
||||
make -j bench_random_mixed_hakmem > /tmp/c6_inline_build_treatment.log 2>&1
|
||||
if [ $? -ne 0 ]; then
|
||||
echo "ERROR: Treatment build failed. Check /tmp/c6_inline_build_treatment.log"
|
||||
exit 1
|
||||
fi
|
||||
echo "Build succeeded (log: /tmp/c6_inline_build_treatment.log)"
|
||||
|
||||
echo ""
|
||||
echo "Running treatment 10-run with perf stat (WS=400, ITERS=20000000, ENV=1)..."
|
||||
echo ""
|
||||
|
||||
# Run treatment benchmark 10 times with perf stat
|
||||
for i in {1..10}; do
|
||||
echo "=== Treatment Run $i/10 (C6 INLINE=ON) ==="
|
||||
HAKMEM_WARM_POOL_SIZE=16 HAKMEM_TINY_C6_INLINE_SLOTS=1 \
|
||||
perf stat -e cycles,instructions,branches,branch-misses,cache-misses,dTLB-load-misses \
|
||||
./bench_random_mixed_hakmem 20000000 400 1 2>&1
|
||||
done > /tmp/c6_inline_treatment.log 2>&1
|
||||
|
||||
echo "Treatment runs complete (log: /tmp/c6_inline_treatment.log)"
|
||||
|
||||
# ============================================================================
|
||||
# Analysis: Extract throughput and calculate delta
|
||||
# ============================================================================
|
||||
|
||||
echo ""
|
||||
echo "========================================="
|
||||
echo "ANALYSIS: Throughput Comparison"
|
||||
echo "========================================="
|
||||
echo ""
|
||||
|
||||
# Extract throughput values (look for "ops/s" pattern)
|
||||
baseline_throughput=$(grep -oP '\d+\.\d+M ops/s' /tmp/c6_inline_baseline.log | sed 's/M ops\/s//' | awk '{sum+=$1; count++} END {if (count>0) print sum/count; else print "0"}')
|
||||
treatment_throughput=$(grep -oP '\d+\.\d+M ops/s' /tmp/c6_inline_treatment.log | sed 's/M ops\/s//' | awk '{sum+=$1; count++} END {if (count>0) print sum/count; else print "0"}')
|
||||
|
||||
# Calculate delta percentage
|
||||
delta=$(echo "scale=2; (($treatment_throughput - $baseline_throughput) / $baseline_throughput) * 100" | bc)
|
||||
|
||||
echo "Baseline Average: ${baseline_throughput}M ops/s (C6 inline OFF)"
|
||||
echo "Treatment Average: ${treatment_throughput}M ops/s (C6 inline ON)"
|
||||
echo "Delta: ${delta}%"
|
||||
echo ""
|
||||
|
||||
# Decision gate
|
||||
echo "========================================="
|
||||
echo "DECISION GATE (+1.0% GO threshold)"
|
||||
echo "========================================="
|
||||
echo ""
|
||||
|
||||
# Compare delta against thresholds
|
||||
if (( $(echo "$delta >= 1.0" | bc -l) )); then
|
||||
echo "Result: GO (+${delta}%)"
|
||||
echo ""
|
||||
echo "Recommendation:"
|
||||
echo " - Commit changes: 'Phase 75-1: C6-only Inline Slots (+${delta}%)'"
|
||||
echo " - Update CURRENT_TASK.md: Mark Phase 75-1 DONE"
|
||||
echo " - Proceed to Phase 75-2: Add C5 inline slots (85% coverage target)"
|
||||
elif (( $(echo "$delta <= -1.0" | bc -l) )); then
|
||||
echo "Result: NO-GO (${delta}%)"
|
||||
echo ""
|
||||
echo "Recommendation:"
|
||||
echo " - Revert all changes: 'git checkout -- .'"
|
||||
echo " - Document root cause in docs/analysis/PHASE75_C6_INLINE_SLOTS_FAILURE_ANALYSIS.md"
|
||||
echo " - Plan Phase 76: Alternative optimization axis (not hit-path)"
|
||||
else
|
||||
echo "Result: NEUTRAL (${delta}%)"
|
||||
echo ""
|
||||
echo "Recommendation:"
|
||||
echo " - Keep code (default OFF, no impact)"
|
||||
echo " - Freeze C6 optimization"
|
||||
echo " - Evaluate in Phase 76 or proceed to Phase 75-2 with caution"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "========================================="
|
||||
echo "Test complete!"
|
||||
echo ""
|
||||
echo "Logs:"
|
||||
echo " - Baseline: /tmp/c6_inline_baseline.log"
|
||||
echo " - Treatment: /tmp/c6_inline_treatment.log"
|
||||
echo " - Build logs: /tmp/c6_inline_build_*.log"
|
||||
echo "========================================="
|
||||
Reference in New Issue
Block a user