diff --git a/CURRENT_TASK.md b/CURRENT_TASK.md index 8b31cfc0..a432786f 100644 --- a/CURRENT_TASK.md +++ b/CURRENT_TASK.md @@ -162,22 +162,74 @@ Phase 6-10 で達成した累積改善: 詳細: `docs/analysis/PHASE12_STRATEGIC_PAUSE_RESULTS.md` -### Next: Phase 13(Header Write Elimination) +### Phase 13: Header Write Elimination v1 — NEUTRAL (+0.78%) ⚠️ RESEARCH BOX -**方向性決定**: Pause 解除、Phase 13 へ進む ✅ +**Date**: 2025-12-14 +**Verdict**: **NEUTRAL (+0.78%)** — Frozen as research box (default OFF, manual opt-in) -**Target**: 1-byte header write の削除(最優先仮説) +**Target**: steady-state の header write tax 削減(最優先仮説) -**Strategy**: -- Header を user pointer より前に配置(system malloc パターン) -- または header-less classification(RegionId のみ) +**Strategy (v1)**: +- **C7 freelist がヘッダを壊さない**形に寄せ、E5-2(write-once)を C7 にも適用可能にする +- ENV: `HAKMEM_TINY_C7_PRESERVE_HEADER=0/1` (default: 0) -**Expected ROI**: **+10-20%** +**Results (4-Point Matrix)**: +| Case | C7_PRESERVE | WRITE_ONCE | Mean (ops/s) | Delta | Verdict | +|------|-------------|------------|--------------|-------|---------| +| A (baseline) | 0 | 0 | 51,490,500 | — | — | +| **B (E5-2 only)** | 0 | 1 | **52,070,600** | **+1.13%** | candidate | +| C (C7 preserve) | 1 | 0 | 51,355,200 | -0.26% | NEUTRAL | +| D (Phase 13 v1) | 1 | 1 | 51,891,902 | +0.78% | NEUTRAL | -**Next Actions**: -1. Header write overhead の実測(perf annotate) -2. Header-less classification の feasibility 検証 -3. Phase 13 設計書の作成 +**Key Findings**: +1. **E5-2 (HAKMEM_TINY_HEADER_WRITE_ONCE=1) は “単発 +1.13%” を観測したが、20-run 再テストで NEUTRAL (+0.54%)** + - 参照: `docs/analysis/PHASE5_E5_2_HEADER_WRITE_ONCE_RETEST_AB_TEST_RESULTS.md` + - 結論: E5-2 は research box 維持(default OFF) + +2. **C7 preserve header alone: -0.26%** (slight regression) + - C7 offset=1 memcpy overhead outweighs benefits + +3. **Combined (Phase 13 v1): +0.78%** (positive but below GO) + - C7 preserve reduces E5-2 gains + +**Action**: +- ✅ Freeze Phase 13 v1 as research box (default OFF) +- ✅ Re-test Phase 5 E5-2 (WRITE_ONCE=1) with dedicated 20-run → NEUTRAL (+0.54%) +- 📋 Document results: `docs/analysis/PHASE13_HEADER_WRITE_ELIMINATION_1_AB_TEST_RESULTS.md` + +### Phase 5 E5-2: Header Write-Once — 再テスト NEUTRAL (+0.54%) ⚪ + +**Date**: 2025-12-14 +**Verdict**: ⚪ **NEUTRAL (+0.54%)** — Research box 維持(default OFF) + +**Motivation**: Phase 13 の 4点マトリクスで E5-2 単体が +1.13% を記録したため、専用 20-run で昇格可否を判定。 + +**Results (20-run)**: +| Case | WRITE_ONCE | Mean (ops/s) | Median (ops/s) | Delta | +|------|------------|--------------|----------------|-------| +| A (baseline) | 0 | 51,096,839 | 51,127,725 | — | +| B (optimized) | 1 | 51,371,358 | 51,495,811 | **+0.54%** | + +**Verdict**: NEUTRAL (+0.54%) — GO 閾値 (+1.0%) 未達 + +**考察**: +- Phase 13 の +1.13% は 10-run での観測値 +- 専用 20-run では +0.54%(より信頼性が高い) +- 旧 E5-2 テスト (+0.45%) と一貫性あり + +**Action**: +- ✅ Research box 維持(default OFF、manual opt-in) +- ENV: `HAKMEM_TINY_HEADER_WRITE_ONCE=0/1` (default: 0) +- 📋 詳細: `docs/analysis/PHASE5_E5_2_HEADER_WRITE_ONCE_RETEST_AB_TEST_RESULTS.md` + +**Next**: Phase 12 Strategic Pause の次の gap 仮説へ進む + +### Next: Phase 14(Pointer Chase Reduction / Tiny tcache) + +**狙い**: system malloc の tcache に寄せて、Tiny frontend の “配列/FIFO/indirection” コストを減らす。 + +- 設計: `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_DESIGN.md` +- 指示: `docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_NEXT_INSTRUCTIONS.md` ## 更新メモ(2025-12-14 Phase 5 E5-3 Analysis - Strategic Pivot) diff --git a/Makefile b/Makefile index d8b807a6..ad914f5b 100644 --- a/Makefile +++ b/Makefile @@ -218,12 +218,12 @@ LDFLAGS += $(EXTRA_LDFLAGS) # Targets TARGET = test_hakmem -OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/hakmem_env_snapshot_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o +OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o OBJS = $(OBJS_BASE) # Shared library SHARED_LIB = libhakmem.so -SHARED_OBJS = hakmem_shared.o hakmem_config_shared.o hakmem_tiny_config_shared.o hakmem_ucb1_shared.o hakmem_bigcache_shared.o hakmem_pool_shared.o hakmem_l25_pool_shared.o hakmem_site_rules_shared.o hakmem_tiny_shared.o core/box/ss_allocation_box_shared.o superslab_stats_shared.o superslab_cache_shared.o superslab_ace_shared.o superslab_slab_shared.o superslab_backend_shared.o core/superslab_head_stub_shared.o hakmem_smallmid_shared.o core/box/superslab_expansion_box_shared.o core/box/integrity_box_shared.o core/box/mailbox_box_shared.o core/box/front_gate_box_shared.o core/box/front_gate_classifier_shared.o core/box/free_publish_box_shared.o core/box/capacity_box_shared.o core/box/carve_push_box_shared.o core/box/prewarm_box_shared.o core/box/ss_hot_prewarm_box_shared.o core/box/front_metrics_box_shared.o core/box/bench_fast_box_shared.o core/box/ss_addr_map_box_shared.o core/box/ss_pt_impl_shared.o core/box/slab_recycling_box_shared.o core/box/pagefault_telemetry_box_shared.o core/box/tiny_sizeclass_hist_box_shared.o core/box/tiny_env_box_shared.o core/box/tiny_route_box_shared.o core/box/free_front_v3_env_box_shared.o core/box/free_path_stats_box_shared.o core/box/free_dispatch_stats_box_shared.o core/box/alloc_gate_stats_box_shared.o core/box/tiny_page_box_shared.o core/box/tiny_class_policy_box_shared.o core/box/tiny_class_stats_box_shared.o core/box/tiny_policy_learner_box_shared.o core/box/ss_budget_box_shared.o core/box/tiny_mem_stats_box_shared.o core/box/wrapper_env_box_shared.o core/box/free_wrapper_env_snapshot_box_shared.o core/box/malloc_wrapper_env_snapshot_box_shared.o core/box/madvise_guard_box_shared.o core/box/libm_reloc_guard_box_shared.o core/box/hakmem_env_snapshot_box_shared.o core/page_arena_shared.o core/front/tiny_unified_cache_shared.o core/tiny_alloc_fast_push_shared.o core/tiny_c7_ultra_segment_shared.o core/tiny_c7_ultra_shared.o core/link_stubs_shared.o core/tiny_failfast_shared.o tiny_sticky_shared.o tiny_remote_shared.o tiny_publish_shared.o tiny_debug_ring_shared.o hakmem_tiny_magazine_shared.o hakmem_tiny_stats_shared.o hakmem_tiny_sfc_shared.o hakmem_tiny_query_shared.o hakmem_tiny_rss_shared.o hakmem_tiny_registry_shared.o hakmem_tiny_remote_target_shared.o hakmem_tiny_bg_spill_shared.o tiny_adaptive_sizing_shared.o hakmem_super_registry_shared.o hakmem_shared_pool_shared.o hakmem_shared_pool_acquire_shared.o hakmem_shared_pool_release_shared.o hakmem_elo_shared.o hakmem_batch_shared.o hakmem_p2_shared.o hakmem_sizeclass_dist_shared.o hakmem_evo_shared.o hakmem_debug_shared.o hakmem_sys_shared.o hakmem_whale_shared.o hakmem_policy_shared.o hakmem_ace_shared.o hakmem_ace_stats_shared.o hakmem_ace_controller_shared.o hakmem_ace_metrics_shared.o hakmem_ace_ucb1_shared.o hakmem_prof_shared.o hakmem_learner_shared.o hakmem_size_hist_shared.o hakmem_learn_log_shared.o hakmem_syscall_shared.o tiny_fastcache_shared.o core/box/super_reg_box_shared.o core/box/shared_pool_box_shared.o core/box/remote_side_box_shared.o core/tiny_destructors_shared.o +SHARED_OBJS = hakmem_shared.o hakmem_config_shared.o hakmem_tiny_config_shared.o hakmem_ucb1_shared.o hakmem_bigcache_shared.o hakmem_pool_shared.o hakmem_l25_pool_shared.o hakmem_site_rules_shared.o hakmem_tiny_shared.o core/box/ss_allocation_box_shared.o superslab_stats_shared.o superslab_cache_shared.o superslab_ace_shared.o superslab_slab_shared.o superslab_backend_shared.o core/superslab_head_stub_shared.o hakmem_smallmid_shared.o core/box/superslab_expansion_box_shared.o core/box/integrity_box_shared.o core/box/mailbox_box_shared.o core/box/front_gate_box_shared.o core/box/front_gate_classifier_shared.o core/box/free_publish_box_shared.o core/box/capacity_box_shared.o core/box/carve_push_box_shared.o core/box/prewarm_box_shared.o core/box/ss_hot_prewarm_box_shared.o core/box/front_metrics_box_shared.o core/box/bench_fast_box_shared.o core/box/ss_addr_map_box_shared.o core/box/ss_pt_impl_shared.o core/box/slab_recycling_box_shared.o core/box/pagefault_telemetry_box_shared.o core/box/tiny_sizeclass_hist_box_shared.o core/box/tiny_env_box_shared.o core/box/tiny_route_box_shared.o core/box/free_front_v3_env_box_shared.o core/box/free_path_stats_box_shared.o core/box/free_dispatch_stats_box_shared.o core/box/alloc_gate_stats_box_shared.o core/box/tiny_page_box_shared.o core/box/tiny_class_policy_box_shared.o core/box/tiny_class_stats_box_shared.o core/box/tiny_policy_learner_box_shared.o core/box/ss_budget_box_shared.o core/box/tiny_mem_stats_box_shared.o core/box/wrapper_env_box_shared.o core/box/free_wrapper_env_snapshot_box_shared.o core/box/malloc_wrapper_env_snapshot_box_shared.o core/box/madvise_guard_box_shared.o core/box/libm_reloc_guard_box_shared.o core/box/hakmem_env_snapshot_box_shared.o core/box/tiny_c7_preserve_header_env_box_shared.o core/page_arena_shared.o core/front/tiny_unified_cache_shared.o core/tiny_alloc_fast_push_shared.o core/tiny_c7_ultra_segment_shared.o core/tiny_c7_ultra_shared.o core/link_stubs_shared.o core/tiny_failfast_shared.o tiny_sticky_shared.o tiny_remote_shared.o tiny_publish_shared.o tiny_debug_ring_shared.o hakmem_tiny_magazine_shared.o hakmem_tiny_stats_shared.o hakmem_tiny_sfc_shared.o hakmem_tiny_query_shared.o hakmem_tiny_rss_shared.o hakmem_tiny_registry_shared.o hakmem_tiny_remote_target_shared.o hakmem_tiny_bg_spill_shared.o tiny_adaptive_sizing_shared.o hakmem_super_registry_shared.o hakmem_shared_pool_shared.o hakmem_shared_pool_acquire_shared.o hakmem_shared_pool_release_shared.o hakmem_elo_shared.o hakmem_batch_shared.o hakmem_p2_shared.o hakmem_sizeclass_dist_shared.o hakmem_evo_shared.o hakmem_debug_shared.o hakmem_sys_shared.o hakmem_whale_shared.o hakmem_policy_shared.o hakmem_ace_shared.o hakmem_ace_stats_shared.o hakmem_ace_controller_shared.o hakmem_ace_metrics_shared.o hakmem_ace_ucb1_shared.o hakmem_prof_shared.o hakmem_learner_shared.o hakmem_size_hist_shared.o hakmem_learn_log_shared.o hakmem_syscall_shared.o tiny_fastcache_shared.o core/box/super_reg_box_shared.o core/box/shared_pool_box_shared.o core/box/remote_side_box_shared.o core/tiny_destructors_shared.o # Pool TLS Phase 1 (enable with POOL_TLS_PHASE1=1) ifeq ($(POOL_TLS_PHASE1),1) @@ -427,7 +427,7 @@ test-box-refactor: box-refactor ./larson_hakmem 10 8 128 1024 1 12345 4 # Phase 4: Tiny Pool benchmarks (properly linked with hakmem) -TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/hakmem_env_snapshot_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o +TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o TINY_BENCH_OBJS = $(TINY_BENCH_OBJS_BASE) ifeq ($(POOL_TLS_PHASE1),1) TINY_BENCH_OBJS += pool_tls.o pool_refill.o core/pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o diff --git a/core/bench_profile.h b/core/bench_profile.h index c6dae0c3..a187f8d9 100644 --- a/core/bench_profile.h +++ b/core/bench_profile.h @@ -10,6 +10,7 @@ #include "box/tiny_static_route_box.h" // tiny_static_route_refresh_from_env (Phase 3 C3) #include "box/hakmem_env_snapshot_box.h" // hakmem_env_snapshot_refresh_from_env (Phase 4 E1) #include "box/tiny_free_route_cache_env_box.h" // tiny_free_static_route_refresh_from_env (Phase 8) +#include "box/tiny_c7_preserve_header_env_box.h" // tiny_c7_preserve_header_env_refresh_from_env (Phase 13 v1) #endif // env が未設定のときだけ既定値を入れる @@ -184,5 +185,7 @@ static inline void bench_apply_profile(void) { hakmem_env_snapshot_refresh_from_env(); // Phase 8: Sync free static route ENV cache after bench_profile putenv defaults. tiny_free_static_route_refresh_from_env(); + // Phase 13 v1: Sync C7 preserve header ENV cache after bench_profile putenv defaults. + tiny_c7_preserve_header_env_refresh_from_env(); #endif } diff --git a/core/box/tiny_c7_preserve_header_env_box.c b/core/box/tiny_c7_preserve_header_env_box.c new file mode 100644 index 00000000..8c5c8e7a --- /dev/null +++ b/core/box/tiny_c7_preserve_header_env_box.c @@ -0,0 +1,50 @@ +// ============================================================================ +// Phase 13 v1: Tiny C7 Preserve Header ENV Box (L0) - Implementation +// ============================================================================ + +#include "tiny_c7_preserve_header_env_box.h" +#include +#include +#include +#include + +// ============================================================================ +// Global State +// ============================================================================ + +_Atomic int g_tiny_c7_preserve_header_enabled = -1; + +// ============================================================================ +// Init (Cold Path) +// ============================================================================ + +int tiny_c7_preserve_header_env_init(void) { + const char* env = getenv("HAKMEM_TINY_C7_PRESERVE_HEADER"); + int enabled = 0; // default: OFF (opt-in) + + if (env && (env[0] == '1' || strcmp(env, "true") == 0 || strcmp(env, "TRUE") == 0)) { + enabled = 1; + } + + // Cache result + atomic_store_explicit(&g_tiny_c7_preserve_header_enabled, enabled, memory_order_relaxed); + + // Log once (stderr for immediate visibility) + if (enabled) { + const char msg[] = "[C7_PRESERVE_HEADER] enabled\n"; + ssize_t w = write(2, msg, sizeof(msg) - 1); + (void)w; + } + + return enabled; +} + +// ============================================================================ +// Refresh (Cold Path, called from bench_profile) +// ============================================================================ + +void tiny_c7_preserve_header_env_refresh_from_env(void) { + // Reset to uninitialized state (-1) + // Next call to tiny_c7_preserve_header_enabled() will re-read ENV + atomic_store_explicit(&g_tiny_c7_preserve_header_enabled, -1, memory_order_relaxed); +} diff --git a/core/box/tiny_c7_preserve_header_env_box.h b/core/box/tiny_c7_preserve_header_env_box.h new file mode 100644 index 00000000..d992562f --- /dev/null +++ b/core/box/tiny_c7_preserve_header_env_box.h @@ -0,0 +1,72 @@ +// ============================================================================ +// Phase 13 v1: Tiny C7 Preserve Header ENV Box (L0) +// ============================================================================ +// +// Purpose: ENV gate for C7 header-preserving freelist layout +// +// Design: docs/analysis/PHASE13_HEADER_WRITE_ELIMINATION_1_DESIGN.md +// Instructions: docs/analysis/PHASE13_HEADER_WRITE_ELIMINATION_1_NEXT_INSTRUCTIONS.md +// +// Strategy: +// - C7 (1025-2048B) の freelist が header を壊さないようにする +// - nextptr offset を 0→1 に変更(header 1B をスキップ) +// - これにより alloc 時の header 再書き込みを削減できる +// +// ENV: +// HAKMEM_TINY_C7_PRESERVE_HEADER=0/1 (default: 0, opt-in) +// +// API: +// tiny_c7_preserve_header_enabled() -> int +// tiny_c7_preserve_header_env_refresh_from_env() +// +// Box Theory: +// - L0: This file (ENV gate,戻せる) +// - L1: tiny_layout_box.h (SSOT: tiny_nextptr_offset) +// - L2: tiny_nextptr.h, tiny_header_box.h (affected code) +// +// Safety: +// - ENV-gated (default OFF, opt-in) +// - Reversible (ENV toggle) +// - Minimal change (C7 offset 0→1 のみ) +// +// ============================================================================ + +#ifndef TINY_C7_PRESERVE_HEADER_ENV_BOX_H +#define TINY_C7_PRESERVE_HEADER_ENV_BOX_H + +#include + +// ============================================================================ +// Global State (L0) +// ============================================================================ + +// Cached state: -1 (uninitialized), 0 (disabled), 1 (enabled) +extern _Atomic int g_tiny_c7_preserve_header_enabled; + +// ============================================================================ +// Hot Inline API (L0) +// ============================================================================ + +// Check if C7 preserve header is enabled +// Returns: 1 if enabled, 0 if disabled +static inline int tiny_c7_preserve_header_enabled(void) { + int val = atomic_load_explicit(&g_tiny_c7_preserve_header_enabled, memory_order_relaxed); + + if (__builtin_expect(val == -1, 0)) { + // Lazy init: read ENV once + extern int tiny_c7_preserve_header_env_init(void); + val = tiny_c7_preserve_header_env_init(); + } + + return val; +} + +// ============================================================================ +// Cold API (L2) +// ============================================================================ + +// Refresh ENV cache (called from bench_profile after putenv) +// Pattern: Same as Phase 8 (FREE_STATIC_ROUTE) +extern void tiny_c7_preserve_header_env_refresh_from_env(void); + +#endif // TINY_C7_PRESERVE_HEADER_ENV_BOX_H diff --git a/core/box/tiny_header_box.h b/core/box/tiny_header_box.h index 266cd8a0..ec48218c 100644 --- a/core/box/tiny_header_box.h +++ b/core/box/tiny_header_box.h @@ -41,13 +41,14 @@ // // Returns: // true - C1-C6: Header preserved at offset 0, next at offset 1 -// false - C0, C7: Header overwritten by next pointer at offset 0 +// false - C0: Header overwritten by next pointer at offset 0 +// Phase 13 v1: C7 returns false (default) or true (HAKMEM_TINY_C7_PRESERVE_HEADER=1) static inline bool tiny_class_preserves_header(int class_idx) { #if HAKMEM_TINY_HEADER_CLASSIDX // Delegate to tiny_layout_box.h specification (Single Source of Truth) - // next_off=0 → header overwritten (C0, C7) - // next_off=1 → header preserved (C1-C6) + // next_off=0 → header overwritten (C0, C7 default) + // next_off=1 → header preserved (C1-C6, C7 with HAKMEM_TINY_C7_PRESERVE_HEADER=1) return tiny_nextptr_offset(class_idx) != 0; #else // Headers disabled globally @@ -87,7 +88,8 @@ static inline void tiny_header_write_if_preserved(void* base, int class_idx) { // ============================================================================ // // Validates header ONLY if this class preserves headers. -// For C0/C7, validation is impossible (next pointer is stored at offset 0). +// For C0, validation is impossible (next pointer is stored at offset 0). +// Phase 13 v1: C7 validation depends on HAKMEM_TINY_C7_PRESERVE_HEADER. // // Arguments: // base - BASE pointer (not user pointer) diff --git a/core/box/tiny_layout_box.h b/core/box/tiny_layout_box.h index 5885759d..a5246880 100644 --- a/core/box/tiny_layout_box.h +++ b/core/box/tiny_layout_box.h @@ -79,14 +79,29 @@ static inline size_t tiny_user_offset(int class_idx) { // Offset for storing the freelist next pointer inside a freed block. // This is distinct from tiny_user_offset(): // - User offset is always +1 in header mode. -// - Next offset is 0 for C0/C7 (cannot preserve header while free), else 1. +// - Next offset: +// - C0: always 0 (16B, cannot fit header+next) +// - C1-C6: always 1 (header-preserving) +// - C7: 0 (default) or 1 (Phase 13 v1: header-preserving) static inline size_t tiny_nextptr_offset(int class_idx) { #if HAKMEM_TINY_HEADERLESS (void)class_idx; return 0; #elif HAKMEM_TINY_HEADER_CLASSIDX - // Bit pattern: C0=0, C1-C6=1, C7=0 → 0b01111110 = 0x7E - return (0x7Eu >> ((unsigned)class_idx & 7u)) & 1u; + // Phase 13 v1: C7 preserve header gate + // Bit pattern (default): C0=0, C1-C6=1, C7=0 → 0b01111110 = 0x7E + // Bit pattern (C7 preserve): C0=0, C1-C7=1 → 0b11111110 = 0xFE + unsigned int base_pattern = 0x7Eu; // default: C7 offset=0 + + // Phase 13 v1: Gate for C7 header-preserving layout + if (class_idx == 7) { + extern int tiny_c7_preserve_header_enabled(void); + if (tiny_c7_preserve_header_enabled()) { + base_pattern = 0xFEu; // C7 offset=1 (header-preserving) + } + } + + return (base_pattern >> ((unsigned)class_idx & 7u)) & 1u; #else (void)class_idx; return 0u; diff --git a/core/tiny_nextptr.h b/core/tiny_nextptr.h index 1b986d33..c670309b 100644 --- a/core/tiny_nextptr.h +++ b/core/tiny_nextptr.h @@ -1,7 +1,8 @@ // tiny_nextptr.h - Authoritative next-pointer offset/load/store for tiny boxes // // Finalized Phase E1-CORRECT spec (物理制約込み): -// P0.1 updated: C0 and C7 use offset 0, C1-C6 use offset 1 (header preserved) +// P0.1 updated: C0 uses offset 0, C1-C6 use offset 1 (header preserved) +// Phase 13 v1: C7 uses offset 0 (default) or 1 (HAKMEM_TINY_C7_PRESERVE_HEADER=1) // // HAKMEM_TINY_HEADER_CLASSIDX != 0 のとき: // @@ -18,8 +19,8 @@ // // Class 7: // [1B header][payload 2047B] -// → headerは上書きし、next は base+0 に格納(最大サイズなので許容) -// → next_off = 0 +// → next_off = 0 (default: headerは上書き) +// → next_off = 1 (Phase 13 v1: HAKMEM_TINY_C7_PRESERVE_HEADER=1) // // HAKMEM_TINY_HEADER_CLASSIDX == 0 のとき: // @@ -56,7 +57,8 @@ static __thread void* g_tiny_next_ra1 __attribute__((unused)) = NULL; static __thread void* g_tiny_next_ra2 __attribute__((unused)) = NULL; // Compute freelist next-pointer offset within a block for the given class. -// P0.1 updated: C0 and C7 use offset 0, C1-C6 use offset 1 (header preserved) +// P0.1: C0 uses offset 0, C1-C6 use offset 1 (header preserved) +// Phase 13 v1: C7 uses offset 0 (default) or 1 (HAKMEM_TINY_C7_PRESERVE_HEADER=1) // Rationale for C0: 8B stride cannot fit [1B header][8B next pointer] without overflow static inline __attribute__((always_inline)) size_t tiny_next_off(int class_idx) { return tiny_nextptr_offset(class_idx); @@ -186,7 +188,8 @@ static inline __attribute__((always_inline)) void* tiny_next_load(const void* ba // - When class_map is used for class_idx lookup (default), header restoration is unnecessary // - Alloc path always writes fresh header before returning block to user (HAK_RET_ALLOC) // - ENV: HAKMEM_TINY_RESTORE_HEADER=1 to force header restoration (legacy mode) -// P0.1: C7 uses offset 0 (overwrites header), C0-C6 use offset 1 (header preserved) +// P0.1: C0 uses offset 0 (overwrites header), C1-C6 use offset 1 (header preserved) +// Phase 13 v1: C7 uses offset 0 (default) or 1 (HAKMEM_TINY_C7_PRESERVE_HEADER=1) static inline __attribute__((always_inline)) void tiny_next_store(void* base, int class_idx, void* next) { size_t off = tiny_next_off(class_idx); diff --git a/docs/analysis/PHASE13_HEADER_WRITE_ELIMINATION_1_AB_TEST_RESULTS.md b/docs/analysis/PHASE13_HEADER_WRITE_ELIMINATION_1_AB_TEST_RESULTS.md new file mode 100644 index 00000000..270546cc --- /dev/null +++ b/docs/analysis/PHASE13_HEADER_WRITE_ELIMINATION_1_AB_TEST_RESULTS.md @@ -0,0 +1,58 @@ +# Phase 13 v1: Header Write Elimination(C7 preserve header)A/B 結果 + +**Date**: 2025-12-14 +**Verdict**: ⚪ **NEUTRAL**(Phase 13 v1 は research box freeze / default OFF) + +設計: `docs/analysis/PHASE13_HEADER_WRITE_ELIMINATION_1_DESIGN.md` +手順: `docs/analysis/PHASE13_HEADER_WRITE_ELIMINATION_1_NEXT_INSTRUCTIONS.md` + +--- + +## 1. 目的 + +Phase 12 の gap 仮説(header write tax)に対して、Phase 13 v1 は: + +- **ヘッダを消さずに維持** +- C7 の freelist がヘッダを壊さない(header-preserving)ようにして +- **E5-2(header write-once)を C7 にも拡張**できるかを検証する + +--- + +## 2. 4点マトリクス(throughput) + +| Case | HAKMEM_TINY_C7_PRESERVE_HEADER | HAKMEM_TINY_HEADER_WRITE_ONCE | ops/s | vs Case A | +|------|--------------------------------|-------------------------------|-------|----------| +| A | 0 | 0 | 51,490,500 | baseline | +| B | 0 | 1 | 52,070,600 | **+1.13%** | +| C | 1 | 0 | 51,355,200 | -0.26% | +| D | 1 | 1 | 51,891,902 | +0.78% | + +結論: +- Phase 13 v1(Case D)は **+0.78%** → **NEUTRAL**(GO閾値 +1.0% 未満) +- **E5-2 単体(Case B)が +1.13% で GO 相当**という重要な副産物が得られた + +--- + +## 3. 判定 + +### 3.1 Phase 13 v1(C7 preserve header) + +- **Verdict**: ⚪ NEUTRAL → **research box freeze(default OFF)** +- 推定原因: + - C7 preserve による freelist next のオフセット変更が、保存できた write を相殺(未確定) + +### 3.2 Phase 5 E5-2(Header write-once) + +- **再テスト結果**: + - Phase 13 matrix の単発観測では **+1.13%**(Case B) + - 専用 clean env 20-run 再テストでは **+0.54%(NEUTRAL)** → research box 維持(default OFF) + - 詳細: `docs/analysis/PHASE5_E5_2_HEADER_WRITE_ONCE_RETEST_AB_TEST_RESULTS.md` + +--- + +## 4. Next Actions(推奨) + +1. Phase 13 v1 は freeze(保持はするが default OFF) +2. E5-2 は freeze(default OFF) +3. Phase 13 v1 の派生案(必要なら): + - C7 の next を “より aligned” な位置に置く設計(v1b)を研究箱で検討 diff --git a/docs/analysis/PHASE13_HEADER_WRITE_ELIMINATION_1_DESIGN.md b/docs/analysis/PHASE13_HEADER_WRITE_ELIMINATION_1_DESIGN.md new file mode 100644 index 00000000..161f18ee --- /dev/null +++ b/docs/analysis/PHASE13_HEADER_WRITE_ELIMINATION_1_DESIGN.md @@ -0,0 +1,146 @@ +# Phase 13: Header Write Elimination v1(C7 Header-Preserving Freelist) + +**Date**: 2025-12-14 +**Status**: DESIGN(Phase 13 kickoff)→ ⚪ **NEUTRAL (+0.78%)**(research box freeze, default OFF) + +--- + +## 0. Executive Summary(1枚) + +Phase 12 の比較で **system malloc (glibc) が hakmem より +63.7% 速い**ことが判明し、次の大きい構造差として **“steady-state のヘッダ書き込み(write tax)”** が最優先仮説になった。 + +ただし hakmem は free の hot path で `HEADER_MAGIC` を前提に **ヘッダを読む**ため、ヘッダを “無くす/壊す” と安全性が崩れる。 + +そこで Phase 13 v1 は「ヘッダ自体は維持」しつつ、**C7 の freelist でヘッダを上書きしない**設計に寄せて、既存の **E5-2 (Header write-once)** を **C7 にも適用可能にする**。 + +狙い: +- C1-C6 は既に write-once で “alloc 時ヘッダ書き込み” をスキップ可能 +- **C7 は現状 “free の next がヘッダを潰す” ため、alloc で毎回ヘッダ再書き込みが必要** +- C7 の next を **base+1(user 先頭)**へ移すとヘッダが保持され、write-once で alloc 側の再書き込みを削れる + +--- + +## 1. 現状(なぜ C7 だけ毎回書いているのか) + +### 1.1 重要な前提(現行の正) + +- Free hot path(例: `core/front/malloc_tiny_fast.h` の `free_tiny_fast()`)は、 + - `ptr-1` の `HEADER_MAGIC` を検証し + - class_idx を header から抽出している + → **ヘッダの正しさは safety と fast path の前提** + +### 1.2 E5-2 (Header write-once) の適用範囲 + +- `core/box/tiny_header_box.h` の `tiny_header_finalize_alloc()` が、 + - `HAKMEM_TINY_HEADER_WRITE_ONCE=1` かつ + - `tiny_class_preserves_header(class_idx)=true`(C1-C6) + のとき、alloc 時の `tiny_region_id_write_header()` をスキップする。 + +### 1.3 C7 が write-once にならない理由(根本) + +- `core/box/tiny_layout_box.h` の `tiny_nextptr_offset()` が + - C7 は `next_off=0`(= `base+0` に next を書く) + → free 時に **ヘッダ領域を next pointer で上書き**する + → alloc で必ず `tiny_region_id_write_header()` を実行し直す必要がある + +(C0 も同じだが、C0 は stride 8B のため `base+1` に 8B next を置けない制約がある) + +--- + +## 2. 提案(Phase 13 v1) + +### 2.1 変更のコア + +**C7 の next pointer を `base+1`(user 先頭)に移す**: + +- Before(現行): + - C7: `next_off=0` → `*(void**)base = next`(ヘッダ破壊) +- After(Phase 13 v1): + - C7: `next_off=1` → `memcpy(base+1, &next, 8)`(ヘッダ保持) + +これにより C7 が “header-preserving class” になり、E5-2 の write-once が C7 にも効く。 + +### 2.2 Box Theory(箱割り) + +``` +L0: tiny_c7_preserve_header_env_box (ENV gate, A/B, refresh) + ↓ +L1: tiny_layout_box (tiny_nextptr_offset の SSOT) + ↓ +L2: tiny_nextptr (next load/store は SSOT を参照) + ↓ +L3: tiny_header_box (class_preserves_header → write-once 適用) +``` + +境界は 1 箇所: +- 「C7 の next オフセット決定」= `tiny_nextptr_offset()` に集約(他で分岐しない) + +### 2.3 戻せる(A/B) + +- ENV: `HAKMEM_TINY_C7_PRESERVE_HEADER=0/1`(default: 0) +- まずは research box として導入し、GO なら preset 昇格 + +--- + +## 3. Safety / Invariants(Fail-Fast) + +### 3.1 不変条件 + +- `tiny_next_store/load` は **常に** `tiny_nextptr_offset()` を参照(直書き禁止) +- `tiny_class_preserves_header(class_idx)` は offset!=0 で決まる(ハードコード禁止) +- C7 preserve ON のとき: + - free 後も `*(uint8_t*)base == HEADER_MAGIC|cls` が保持される(ヘッダ破壊が起きない) + +### 3.2 Fail-Fast(debug 限定) + +- デバッグのみ、C7 preserve ON のときに: + - `tiny_header_validate(base, 7, ...)` の mismatch をワンショットで出す +- release では常時ログ無し、必要なら stats カウンタのみ + +--- + +## 4. A/B 計測計画(同一バイナリ) + +この変更は “freelist next の配置” を変えるため、本来は layout 差になるが、Phase 13 v1 は **ENV で切替**できるようにして同一バイナリ A/B を維持する(Phase 5-7 の教訓)。 + +### 4.1 4点マトリクス(必須) + +| Case | HAKMEM_TINY_C7_PRESERVE_HEADER | HAKMEM_TINY_HEADER_WRITE_ONCE | 意味 | +|------|--------------------------------|-------------------------------|------| +| A | 0 | 0 | 現行 baseline | +| B | 0 | 1 | E5-2 のみ(C1-C6) | +| C | 1 | 0 | C7 next を user に移す(ヘッダは毎回書く) | +| D | 1 | 1 | Phase 13 v1 本命(C1-C7 を write-once) | + +### 4.2 GO/NO-GO(Mixed 10-run) + +- GO: mean **+1.0% 以上** +- NO-GO: mean **-1.0% 以下** +- NEUTRAL: ±1.0% → freeze(research box) + +--- + +## 5. リスクと対策 + +### リスク 1: C7 next が unaligned になり memcpy 経由で遅くなる + +- 対策: Case C(write-once 無し)を必ず測り、layout 変更単体のコストを分離する +- もし C が大きく負ける場合: + - “C7 next offset=8(aligned)” の派生案を検討(Phase 13 v1b) + +### リスク 2: class_idx ハードコードが残っていて壊れる + +- 対策: `rg "== 7|!= 7|C7 uses offset 0"` を掃除し、SSOT(`tiny_layout_box`)参照に寄せる + +### リスク 3: ENV refresh が bench_profile putenv に追従しない + +- 対策: Phase 8 と同様に `*_env_refresh_from_env()` を用意し、`bench_profile.h` から呼ぶ + +--- + +## 6. 次(Phase 13 以降の視界) + +Phase 13 v1 は「ヘッダを “消す”」ではなく「**steady-state のヘッダ再書き込みを減らす**」に寄せる。 + +もし system malloc との差がまだ大きい場合、次の大テーマは: +- Thread cache(tcache 相当の構造)を TinyUnifiedCache に移植する(Phase 14 候補) diff --git a/docs/analysis/PHASE13_HEADER_WRITE_ELIMINATION_1_NEXT_INSTRUCTIONS.md b/docs/analysis/PHASE13_HEADER_WRITE_ELIMINATION_1_NEXT_INSTRUCTIONS.md new file mode 100644 index 00000000..df56d02c --- /dev/null +++ b/docs/analysis/PHASE13_HEADER_WRITE_ELIMINATION_1_NEXT_INSTRUCTIONS.md @@ -0,0 +1,134 @@ +# Phase 13: Header Write Elimination v1 — 次の指示書(C7 preserve header) + +## 0. Status + +- Phase 12 で system malloc が hakmem より +63.7% 速いことが判明 → Phase 13 開始 +- 方針(v1): **ヘッダは維持**しつつ、**C7 の freelist がヘッダを壊さない**ようにして “alloc 時のヘッダ再書き込み” を削る +- 結果: ⚪ **NEUTRAL (+0.78%) → freeze (default OFF)**(副産物: E5-2 が +1.13%) + +設計: `docs/analysis/PHASE13_HEADER_WRITE_ELIMINATION_1_DESIGN.md` +結果: `docs/analysis/PHASE13_HEADER_WRITE_ELIMINATION_1_AB_TEST_RESULTS.md` + +--- + +## 1. 目的(GO 条件) + +Mixed 10-run(clean env)で: +- **GO**: mean +1.0% 以上 +- **NO-GO**: mean -1.0% 以下(即 rollback / freeze) +- **NEUTRAL**: ±1.0%(research box freeze) + +--- + +## 2. 実装パッチ順(小さく積む) + +### Patch 1: L0 ENV Box(戻せる) + +新規: +- `core/box/tiny_c7_preserve_header_env_box.h` +- `core/box/tiny_c7_preserve_header_env_box.c`(refresh) + +仕様: +- ENV: `HAKMEM_TINY_C7_PRESERVE_HEADER=0/1`(default: 0) +- API: + - `tiny_c7_preserve_header_enabled() -> int` + - `tiny_c7_preserve_header_env_refresh_from_env()` + +要件: +- hot path では **getenv 禁止**(lazy init + cached read のみ) + +### Patch 2: L1 Layout SSOT 変更(境界1箇所) + +修正: +- `core/box/tiny_layout_box.h` + +変更: +- `tiny_nextptr_offset(class_idx)` の C7 分だけを L0 gate で切替 + - OFF: 既存(C7 off=0) + - ON: C7 off=1(header-preserving) + +### Patch 3: L2 NextPtr のコメント/前提を SSOT 準拠に + +修正(コードの挙動変更はしない): +- `core/tiny_nextptr.h` +- `core/box/tiny_header_box.h`(コメントの “C7=offset0 固定” 等があれば撤去) + +狙い: +- C7 の offset 固定前提を残さない(設計事故の芽を摘む) + +### Patch 4: Bench profile の refresh 同期(ENV 事故防止) + +修正: +- `core/bench_profile.h` + +追加: +- `bench_setenv_default(...)` の後に `tiny_c7_preserve_header_env_refresh_from_env()` を呼ぶ + +(Phase 8 と同じパターン) + +--- + +## 3. A/B テスト(4点マトリクス必須) + +`scripts/run_mixed_10_cleanenv.sh` を使用(ENV リークを防ぐ)。 + +### Case A(baseline) + +```sh +HAKMEM_TINY_C7_PRESERVE_HEADER=0 \ +HAKMEM_TINY_HEADER_WRITE_ONCE=0 \ +scripts/run_mixed_10_cleanenv.sh +``` + +### Case B(E5-2 only) + +```sh +HAKMEM_TINY_C7_PRESERVE_HEADER=0 \ +HAKMEM_TINY_HEADER_WRITE_ONCE=1 \ +scripts/run_mixed_10_cleanenv.sh +``` + +### Case C(C7 preserve only) + +```sh +HAKMEM_TINY_C7_PRESERVE_HEADER=1 \ +HAKMEM_TINY_HEADER_WRITE_ONCE=0 \ +scripts/run_mixed_10_cleanenv.sh +``` + +### Case D(Phase 13 v1 本命) + +```sh +HAKMEM_TINY_C7_PRESERVE_HEADER=1 \ +HAKMEM_TINY_HEADER_WRITE_ONCE=1 \ +scripts/run_mixed_10_cleanenv.sh +``` + +追加(任意): +- `HAKMEM_PROFILE=C6_HEAVY_LEGACY_POOLV1` でも 5-run を取る(回帰が無いこと) + +--- + +## 4. 可視化(最小) + +既存: +- `HAKMEM_TINY_HEADER_WRITE_ONCE_STATS=1` を使い、 + - `alloc_skip_count / alloc_write_count` の比率が増えることを確認する + +新規を足す場合(必要最小): +- “C7 で skip が増えている” が見えない場合のみ、C7 だけのカウンタを追加(常時 atomic は避ける) + +--- + +## 5. 昇格(GO の場合のみ) + +GO のとき: +1. `core/bench_profile.h` に default を追加 + - `bench_setenv_default("HAKMEM_TINY_C7_PRESERVE_HEADER", "1");` + - (必要なら)`HAKMEM_TINY_HEADER_WRITE_ONCE=1` も昇格 +2. `CURRENT_TASK.md` に Phase 13 v1 の結果(A/B 表)を追記 +3. rollback 手順を明記 + - `export HAKMEM_TINY_C7_PRESERVE_HEADER=0` + +NO-GO のとき: +- research box freeze(default OFF のまま)、設計メモに原因を記録 diff --git a/docs/analysis/PHASE5_E5_2_HEADER_REFILL_ONCE_AB_TEST_RESULTS.md b/docs/analysis/PHASE5_E5_2_HEADER_REFILL_ONCE_AB_TEST_RESULTS.md index 4cee7ae4..d161e2f1 100644 --- a/docs/analysis/PHASE5_E5_2_HEADER_REFILL_ONCE_AB_TEST_RESULTS.md +++ b/docs/analysis/PHASE5_E5_2_HEADER_REFILL_ONCE_AB_TEST_RESULTS.md @@ -9,6 +9,19 @@ --- +## Addendum(2025-12-14) + +Phase 13 v1 の 4点マトリクスで、`HAKMEM_TINY_HEADER_WRITE_ONCE=1` 単体が **+1.13%** を観測(候補)。 + +- 結果: `docs/analysis/PHASE13_HEADER_WRITE_ELIMINATION_1_AB_TEST_RESULTS.md` +- ただし、専用 clean env 20-run 再テストでは **+0.54%(NEUTRAL)** となり、昇格は見送り。 + - 詳細: `docs/analysis/PHASE5_E5_2_HEADER_WRITE_ONCE_RETEST_AB_TEST_RESULTS.md` + +結論: +- E5-2 は research box のまま維持(default OFF)。 + +--- + ## A/B Test Results (Mixed Workload) ### Configuration diff --git a/docs/analysis/PHASE5_E5_2_HEADER_REFILL_ONCE_DESIGN.md b/docs/analysis/PHASE5_E5_2_HEADER_REFILL_ONCE_DESIGN.md index 1a0d0b08..784cff64 100644 --- a/docs/analysis/PHASE5_E5_2_HEADER_REFILL_ONCE_DESIGN.md +++ b/docs/analysis/PHASE5_E5_2_HEADER_REFILL_ONCE_DESIGN.md @@ -6,6 +6,12 @@ **Baseline**: 43.998M ops/s (Mixed, 40M iters, ws=400, E4-1+E4-2+E5-1 ON) **Goal**: +1-3% by moving header writes from allocation hot path to refill cold boundary +**Update (2025-12-14)**: +- Phase 13 v1 の 4点マトリクスで `HAKMEM_TINY_HEADER_WRITE_ONCE=1` 単体が **+1.13%** を観測(候補)。 + - `docs/analysis/PHASE13_HEADER_WRITE_ELIMINATION_1_AB_TEST_RESULTS.md` +- 専用 clean env 20-run 再テストでは **+0.54%(NEUTRAL)** → 昇格は見送り。 + - `docs/analysis/PHASE5_E5_2_HEADER_WRITE_ONCE_RETEST_AB_TEST_RESULTS.md` + --- ## Hypothesis diff --git a/docs/analysis/PHASE5_E5_2_HEADER_WRITE_ONCE_PROMOTION_NEXT_INSTRUCTIONS.md b/docs/analysis/PHASE5_E5_2_HEADER_WRITE_ONCE_PROMOTION_NEXT_INSTRUCTIONS.md new file mode 100644 index 00000000..42e913e1 --- /dev/null +++ b/docs/analysis/PHASE5_E5_2_HEADER_WRITE_ONCE_PROMOTION_NEXT_INSTRUCTIONS.md @@ -0,0 +1,76 @@ +# Phase 5 E5-2: Header Write-Once — Promotion 判定用 指示書 + +**Status**: ✅ COMPLETE → ⚪ NEUTRAL(昇格見送り) + +結果: `docs/analysis/PHASE5_E5_2_HEADER_WRITE_ONCE_RETEST_AB_TEST_RESULTS.md` + +## 0. 背景 + +過去の E5-2 A/B では NEUTRAL だったが、Phase 13 v1 の 4点マトリクス再計測で +`HAKMEM_TINY_HEADER_WRITE_ONCE=1` 単体が **+1.13%** を記録し、GO候補になった。 + +参照: +- 旧結果: `docs/analysis/PHASE5_E5_2_HEADER_REFILL_ONCE_AB_TEST_RESULTS.md` +- 新観測: `docs/analysis/PHASE13_HEADER_WRITE_ELIMINATION_1_AB_TEST_RESULTS.md` + +目的: **E5-2 を preset デフォルトへ昇格できるか**を “専用 A/B” で確定する。 + +--- + +## 1. A/B 手順(clean env, 同一バイナリ) + +推奨: Mixed 20-run(mean/median を確度高めに取る) + +### A: baseline(WRITE_ONCE=0) + +```sh +RUNS=20 HAKMEM_TINY_HEADER_WRITE_ONCE=0 scripts/run_mixed_10_cleanenv.sh +``` + +### B: optimized(WRITE_ONCE=1) + +```sh +RUNS=20 HAKMEM_TINY_HEADER_WRITE_ONCE=1 scripts/run_mixed_10_cleanenv.sh +``` + +任意: +- `HAKMEM_PROFILE=C6_HEAVY_LEGACY_POOLV1` でも 5-run を 0/1 で取る(回帰がないこと) + +--- + +## 2. 判定ゲート + +- **GO**: Mixed 20-run mean **+1.0% 以上** かつ median も正 +- **NO-GO**: mean **-1.0% 以下** +- **NEUTRAL**: それ以外(±1.0%)→ research box 維持(default OFF) + +--- + +## 3. GO の場合の昇格手順(小パッチ) + +### Patch P1: preset 昇格 + +- `core/bench_profile.h`(対象プリセット)に追加: + - `bench_setenv_default("HAKMEM_TINY_HEADER_WRITE_ONCE", "1");` + +最初は `MIXED_TINYV3_C7_SAFE` のみに昇格でよい(C6-heavy は任意)。 + +### Patch P2: cleanenv スクリプト更新(ENV 漏れ防止) + +`scripts/run_mixed_10_cleanenv.sh` のデフォルト値を見直す: +- 昇格後は `HAKMEM_TINY_HEADER_WRITE_ONCE` を “研究 knob” 扱いしない +- 例: `export HAKMEM_TINY_HEADER_WRITE_ONCE=${HAKMEM_TINY_HEADER_WRITE_ONCE:-1}` + +(既存の運用: export された値は bench_setenv_default が上書きできないため) + +### Patch P3: ドキュメント更新 + +- 新しい再計測結果を 1 本にまとめる(例: `docs/analysis/PHASE5_E5_2_HEADER_REFILL_ONCE_RETEST_AB_TEST_RESULTS.md`) +- `CURRENT_TASK.md` に “E5-2 ADOPT” の記録を追記 + +--- + +## 4. NO-GO/NEUTRAL の場合 + +- `HAKMEM_TINY_HEADER_WRITE_ONCE` は research box のまま(default OFF) +- 旧結果との差分要因(ベースライン差 / env 漏れ / build 形状)をメモして凍結 diff --git a/docs/analysis/PHASE5_E5_2_HEADER_WRITE_ONCE_RETEST_AB_TEST_RESULTS.md b/docs/analysis/PHASE5_E5_2_HEADER_WRITE_ONCE_RETEST_AB_TEST_RESULTS.md new file mode 100644 index 00000000..6fa6ef56 --- /dev/null +++ b/docs/analysis/PHASE5_E5_2_HEADER_WRITE_ONCE_RETEST_AB_TEST_RESULTS.md @@ -0,0 +1,177 @@ +# Phase 5 E5-2: Header Write-Once — 再テスト結果(昇格判定) + +**Date**: 2025-12-14 +**Verdict**: ⚪ **NEUTRAL (+0.54%)** — Research box 維持(default OFF) + +背景: `docs/analysis/PHASE5_E5_2_HEADER_REFILL_ONCE_AB_TEST_RESULTS.md` +指示: `docs/analysis/PHASE5_E5_2_HEADER_WRITE_ONCE_PROMOTION_NEXT_INSTRUCTIONS.md` + +--- + +## 1. 背景 + +Phase 13 v1 の 4点マトリクス A/B で `HAKMEM_TINY_HEADER_WRITE_ONCE=1` 単体が **+1.13%** を記録し、GO 候補として浮上したため、専用の clean env 20-run で昇格可否を判定。 + +参照: `docs/analysis/PHASE13_HEADER_WRITE_ELIMINATION_1_AB_TEST_RESULTS.md` (Case B) + +--- + +## 2. テスト構成 + +- **Benchmark**: scripts/run_mixed_10_cleanenv.sh +- **Profile**: MIXED_TINYV3_C7_SAFE +- **Iterations**: 20,000,000 per run +- **Working set**: 400 +- **Runs**: 20 per case +- **ENV**: `HAKMEM_TINY_C7_PRESERVE_HEADER=0` 固定(C7 preserve は使用しない) + +--- + +## 3. 結果(20-run) + +| Case | WRITE_ONCE | Mean (ops/s) | Median (ops/s) | Delta vs A | +|------|------------|--------------|----------------|------------| +| A (baseline) | 0 | 51,096,839 | 51,127,725 | — | +| B (optimized) | 1 | 51,371,358 | 51,495,811 | **+0.54%** | + +--- + +## 4. 判定 + +### 4.1 GO 条件 + +- Mean **+1.0%** 以上 かつ Median も正 +- 今回: Mean +0.54%, Median +0.72% + +### 4.2 Verdict + +- **NEUTRAL (+0.54%)** → Research box 維持(default OFF) +- GO 閾値 (+1.0%) に到達せず + +--- + +## 5. 考察 + +### 5.1 Phase 13 の 4点マトリクスとの差異 + +| Test | WRITE_ONCE=1 結果 | Runs | Baseline | +|------|-------------------|------|----------| +| Phase 13 (Case B) | **+1.13%** | 10 | 51,490,500 ops/s | +| 今回 (専用 20-run) | **+0.54%** | 20 | 51,096,839 ops/s | + +**差分要因**: +1. **Baseline の揺らぎ**: Phase 13 の baseline (51.49M) vs 今回 (51.10M) で約 -0.76% の差 +2. **測定回数**: 10-run vs 20-run(20-run の方が信頼性が高い) +3. **ENV 汚染**: Phase 13 では 4 ケースを連続実行(ENV リーク可能性) + +### 5.2 Phase 5 E5-2 旧結果との比較 + +旧テスト(`PHASE5_E5_2_HEADER_REFILL_ONCE_AB_TEST_RESULTS.md`): +- 結果: +0.45% (NEUTRAL) +- 今回: +0.54% (NEUTRAL) + +**一貫性**: 両テストとも NEUTRAL 範囲内で一貫 + +--- + +## 6. Next Actions + +### 6.1 E5-2 の扱い + +- ✅ Research box として維持(default OFF、manual opt-in) +- ENV: `HAKMEM_TINY_HEADER_WRITE_ONCE=0/1` (default: 0) + +### 6.2 Phase 13 v1 の扱い + +- ✅ Research box として維持(default OFF) +- ENV: `HAKMEM_TINY_C7_PRESERVE_HEADER=0/1` (default: 0) + +### 6.3 次の最適化 + +Phase 12 Strategic Pause の gap 仮説リストに戻る: +1. ~~Header write tax~~ → Phase 13 v1 NEUTRAL, E5-2 NEUTRAL +2. **Pointer chase overhead** (次の候補) +3. Lock contention (if applicable) +4. Memory fence overhead +5. Metadata access patterns + +--- + +## 7. Raw Data + +### Case A (baseline, WRITE_ONCE=0) +``` +Run 1: 50725850 ops/s +Run 2: 51547217 ops/s +Run 3: 51076712 ops/s +Run 4: 51527474 ops/s +Run 5: 51193070 ops/s +Run 6: 51597708 ops/s +Run 7: 52239171 ops/s +Run 8: 52386008 ops/s +Run 9: 51618321 ops/s +Run 10: 50919588 ops/s +Run 11: 52415403 ops/s +Run 12: 51125404 ops/s +Run 13: 49785086 ops/s +Run 14: 50915858 ops/s +Run 15: 51130046 ops/s +Run 16: 48960162 ops/s +Run 17: 51385756 ops/s +Run 18: 50849945 ops/s +Run 19: 50550500 ops/s +Run 20: 49987500 ops/s + +Mean: 51096838.95 ops/s +Median: 51127725.00 ops/s +``` + +### Case B (optimized, WRITE_ONCE=1) +``` +Run 1: 51594697 ops/s +Run 2: 50145581 ops/s +Run 3: 52268972 ops/s +Run 4: 52083686 ops/s +Run 5: 50612405 ops/s +Run 6: 50556552 ops/s +Run 7: 49910193 ops/s +Run 8: 52657108 ops/s +Run 9: 52053748 ops/s +Run 10: 51957521 ops/s +Run 11: 52417281 ops/s +Run 12: 51712162 ops/s +Run 13: 51531743 ops/s +Run 14: 50832685 ops/s +Run 15: 51337254 ops/s +Run 16: 51218309 ops/s +Run 17: 50110155 ops/s +Run 18: 51459878 ops/s +Run 19: 51931080 ops/s +Run 20: 51036152 ops/s + +Mean: 51371358.10 ops/s +Median: 51495810.50 ops/s +``` + +--- + +## 8. Rollback 手順 + +Phase 5 E5-2 は ENV-gated で default OFF。Rollback 不要。 + +手動で無効化する場合: +```sh +export HAKMEM_TINY_HEADER_WRITE_ONCE=0 +``` + +--- + +## 9. まとめ + +Phase 5 E5-2 (Header Write-Once) は 20-run 再テストで **+0.54% (NEUTRAL)** を記録。 + +- GO 閾値 (+1.0%) に到達せず +- Research box として維持(default OFF、manual opt-in) +- Phase 13 v1 も同様に research box 維持 + +次のステップ: Phase 12 Strategic Pause の次の gap 仮説に進む diff --git a/scripts/run_mixed_10_cleanenv.sh b/scripts/run_mixed_10_cleanenv.sh index db98c6ad..a727252a 100755 --- a/scripts/run_mixed_10_cleanenv.sh +++ b/scripts/run_mixed_10_cleanenv.sh @@ -11,6 +11,9 @@ runs=${RUNS:-10} # Force known research knobs OFF to avoid accidental carry-over. export HAKMEM_TINY_HEADER_WRITE_ONCE=${HAKMEM_TINY_HEADER_WRITE_ONCE:-0} +export HAKMEM_TINY_C7_PRESERVE_HEADER=${HAKMEM_TINY_C7_PRESERVE_HEADER:-0} +export HAKMEM_TINY_TCACHE=${HAKMEM_TINY_TCACHE:-0} +export HAKMEM_TINY_TCACHE_CAP=${HAKMEM_TINY_TCACHE_CAP:-64} export HAKMEM_MALLOC_TINY_DIRECT=${HAKMEM_MALLOC_TINY_DIRECT:-0} export HAKMEM_ENV_SNAPSHOT_SHAPE=${HAKMEM_ENV_SNAPSHOT_SHAPE:-0} export HAKMEM_FREE_TINY_FAST_MONO_DUALHOT=${HAKMEM_FREE_TINY_FAST_MONO_DUALHOT:-0} @@ -20,4 +23,3 @@ for i in $(seq 1 "${runs}"); do echo "=== Run ${i}/${runs} ===" HAKMEM_PROFILE="${profile}" ./bench_random_mixed_hakmem "${iters}" "${ws}" 1 2>&1 | rg "Throughput" || true done -