Phase TLS-UNIFY-3: C6 intrusive freelist implementation (完成)
Implement C6 ULTRA intrusive LIFO freelist with ENV gating: - Single-linked LIFO using next pointer at USER+1 offset - tiny_next_store/tiny_next_load for pointer access (single source of truth) - Segment learning via ss_fast_lookup (per-class seg_base/seg_end) - ENV gate: HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL (default OFF) - Counters: c6_ifl_push/pop/fallback in FREE_PATH_STATS Files: - core/box/tiny_ultra_tls_box.h: Added c6_head field for intrusive LIFO - core/box/tiny_ultra_tls_box.c: Pop/push with intrusive branching (case 6) - core/box/tiny_c6_ultra_intrusive_env_box.h: ENV gate (new) - core/box/tiny_c6_intrusive_freelist_box.h: L1 pure LIFO (new) - core/tiny_debug_ring.h: C6_IFL events - core/box/free_path_stats_box.h/c: c6_ifl_* counters A/B Test Results (1M iterations, ws=200, 257-512B): - ENV_OFF (array): 56.6 Mop/s avg - ENV_ON (intrusive): 57.6 Mop/s avg (+1.8%, within noise) - Counters verified: c6_ifl_push=265890, c6_ifl_pop=265815, fallback=0 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
100
CURRENT_TASK.md
100
CURRENT_TASK.md
@ -1,3 +1,103 @@
|
|||||||
|
# 本線タスク(現在)
|
||||||
|
|
||||||
|
## 次フェーズ: Phase TLS-UNIFY-3-DESIGN(C6 ULTRA intrusive freelist 設計)
|
||||||
|
|
||||||
|
- 目的: C6 ULTRA 専用の intrusive freelist(ブロック内 next ポインタ)を設計し、TinyUltraTlsCtx 上でどう扱うかを文書化する。
|
||||||
|
- 作業内容:
|
||||||
|
- `docs/analysis/ULTRA_C6_INTRUSIVE_FREELIST_DESIGN_V11B.md` を新規作成し、
|
||||||
|
- C6 ブロックレイアウト(next ポインタ位置 / header 取り扱い),
|
||||||
|
- C6 用 alloc/free API,
|
||||||
|
- 既存 C6 ULTRA から v12 lane への移行プラン
|
||||||
|
をまとめる。
|
||||||
|
- TLS 統合との整合性メモ(TinyUltraTlsCtx の c6_* フィールドを使う / C4-C5 は当面 array マガジンのまま)を書いておく。
|
||||||
|
- このフェーズは **設計だけ**。実装は次セッション以降。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase TLS-UNIFY-2a: C4-C6 TLS統合 - COMPLETED ✅
|
||||||
|
|
||||||
|
**変更**: C4-C6 ULTRA の TLS を `TinyUltraTlsCtx` 1 struct に統合。配列マガジン方式維持、C7 は別箱のまま。
|
||||||
|
|
||||||
|
**A/B テスト結果**:
|
||||||
|
| Workload | v11b-1 (Phase 1) | TLS-UNIFY-2a | 差分 |
|
||||||
|
|----------|------------------|--------------|------|
|
||||||
|
| Mixed 16-1024B | 8.0-8.8 Mop/s | 8.5-9.0 Mop/s | +0~5% |
|
||||||
|
| MID 257-768B | 8.5-9.0 Mop/s | 8.1-9.0 Mop/s | ±0% |
|
||||||
|
|
||||||
|
**結果**: C4-C6 ULTRA の TLS は TinyUltraTlsCtx 1箱に収束。性能同等以上、SEGV/assert なし ✅
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase v11b-1: Free Path Optimization - COMPLETED ✅
|
||||||
|
|
||||||
|
**変更**: `free_tiny_fast()` のシリアルULTRAチェック (C7→C6→C5→C4) を単一switch構造に統合。C7 early-exit追加。
|
||||||
|
|
||||||
|
**結果 (vs v11a-5)**:
|
||||||
|
| Workload | v11a-5 | v11b-1 | 改善 |
|
||||||
|
|----------|--------|--------|------|
|
||||||
|
| Mixed 16-1024B | 45.4M | 50.7M | **+11.7%** |
|
||||||
|
| C6-heavy | 49.1M | 52.0M | **+5.9%** |
|
||||||
|
| C6-heavy + MID v3.5 | 53.1M | 53.6M | +0.9% |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 本線プロファイル決定
|
||||||
|
|
||||||
|
| Workload | MID v3.5 | 理由 |
|
||||||
|
|----------|----------|------|
|
||||||
|
| **Mixed 16-1024B** | OFF | LEGACYが最速 (45.4M ops/s) |
|
||||||
|
| **C6-heavy (257-512B)** | ON (C6-only) | +8%改善 (53.1M ops/s) |
|
||||||
|
|
||||||
|
ENV設定:
|
||||||
|
- `MIXED_TINYV3_C7_SAFE`: `HAKMEM_MID_V35_ENABLED=0`
|
||||||
|
- `C6_HEAVY_LEGACY_POOLV1`: `HAKMEM_MID_V35_ENABLED=1 HAKMEM_MID_V35_CLASSES=0x40`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# Phase v11a-5: Hot Path Optimization - COMPLETED
|
||||||
|
|
||||||
|
## Status: ✅ COMPLETE - 大幅な性能改善達成
|
||||||
|
|
||||||
|
### 変更内容
|
||||||
|
|
||||||
|
1. **Hot path簡素化**: `malloc_tiny_fast()` を単一switch構造に統合
|
||||||
|
2. **C7 ULTRA early-exit**: Policy snapshot前にC7 ULTRAをearly-exit(最大ホットパス最適化)
|
||||||
|
3. **ENV checks移動**: すべてのENVチェックをPolicy initに集約
|
||||||
|
|
||||||
|
### 結果サマリ (vs v11a-4)
|
||||||
|
|
||||||
|
| Workload | v11a-4 Baseline | v11a-5 Baseline | 改善 |
|
||||||
|
|----------|-----------------|-----------------|------|
|
||||||
|
| Mixed 16-1024B | 38.6M | 45.4M | **+17.6%** |
|
||||||
|
| C6-heavy (257-512B) | 39.0M | 49.1M | **+26%** |
|
||||||
|
|
||||||
|
| Workload | v11a-4 MID v3.5 | v11a-5 MID v3.5 | 改善 |
|
||||||
|
|----------|-----------------|-----------------|------|
|
||||||
|
| Mixed 16-1024B | 40.3M | 41.8M | +3.7% |
|
||||||
|
| C6-heavy (257-512B) | 40.2M | 53.1M | **+32%** |
|
||||||
|
|
||||||
|
### v11a-5 内部比較
|
||||||
|
|
||||||
|
| Workload | Baseline | MID v3.5 ON | 差分 |
|
||||||
|
|----------|----------|-------------|------|
|
||||||
|
| Mixed 16-1024B | 45.4M | 41.8M | -8% (LEGACYが速い) |
|
||||||
|
| C6-heavy (257-512B) | 49.1M | 53.1M | **+8.1%** |
|
||||||
|
|
||||||
|
### 結論
|
||||||
|
|
||||||
|
1. **Hot path最適化で大幅改善**: Baseline +17-26%、MID v3.5 ON +3-32%
|
||||||
|
2. **C7 early-exitが効果大**: Policy snapshot回避で約10M ops/s向上
|
||||||
|
3. **MID v3.5はC6-heavyで有効**: C6主体ワークロードで+8%改善
|
||||||
|
4. **Mixedワークロードではbaselineが最適**: LEGACYパスがシンプルで速い
|
||||||
|
|
||||||
|
### 技術詳細
|
||||||
|
|
||||||
|
- C7 ULTRA early-exit: `tiny_c7_ultra_enabled_env()` (static cached) で判定
|
||||||
|
- Policy snapshot: TLSキャッシュ + version check (version mismatch時のみ再初期化)
|
||||||
|
- Single switch: route_kind[class_idx] で分岐(ULTRA/MID_V35/V7/MID_V3/LEGACY)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
# Phase v11a-4: MID v3.5 Mixed本線テスト - COMPLETED
|
# Phase v11a-4: MID v3.5 Mixed本線テスト - COMPLETED
|
||||||
|
|
||||||
## Status: ✅ COMPLETE - C6→MID v3.5 採用候補
|
## Status: ✅ COMPLETE - C6→MID v3.5 採用候補
|
||||||
|
|||||||
6
Makefile
6
Makefile
@ -218,7 +218,7 @@ LDFLAGS += $(EXTRA_LDFLAGS)
|
|||||||
|
|
||||||
# Targets
|
# Targets
|
||||||
TARGET = test_hakmem
|
TARGET = test_hakmem
|
||||||
OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/wrapper_env_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o
|
OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/wrapper_env_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o
|
||||||
OBJS = $(OBJS_BASE)
|
OBJS = $(OBJS_BASE)
|
||||||
|
|
||||||
# Shared library
|
# Shared library
|
||||||
@ -250,7 +250,7 @@ endif
|
|||||||
# Benchmark targets
|
# Benchmark targets
|
||||||
BENCH_HAKMEM = bench_allocators_hakmem
|
BENCH_HAKMEM = bench_allocators_hakmem
|
||||||
BENCH_SYSTEM = bench_allocators_system
|
BENCH_SYSTEM = bench_allocators_system
|
||||||
BENCH_HAKMEM_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/wrapper_env_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o bench_allocators_hakmem.o
|
BENCH_HAKMEM_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/wrapper_env_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o bench_allocators_hakmem.o
|
||||||
BENCH_HAKMEM_OBJS = $(BENCH_HAKMEM_OBJS_BASE)
|
BENCH_HAKMEM_OBJS = $(BENCH_HAKMEM_OBJS_BASE)
|
||||||
ifeq ($(POOL_TLS_PHASE1),1)
|
ifeq ($(POOL_TLS_PHASE1),1)
|
||||||
BENCH_HAKMEM_OBJS += pool_tls.o pool_refill.o pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o
|
BENCH_HAKMEM_OBJS += pool_tls.o pool_refill.o pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o
|
||||||
@ -427,7 +427,7 @@ test-box-refactor: box-refactor
|
|||||||
./larson_hakmem 10 8 128 1024 1 12345 4
|
./larson_hakmem 10 8 128 1024 1 12345 4
|
||||||
|
|
||||||
# Phase 4: Tiny Pool benchmarks (properly linked with hakmem)
|
# Phase 4: Tiny Pool benchmarks (properly linked with hakmem)
|
||||||
TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/wrapper_env_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o
|
TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/wrapper_env_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o
|
||||||
TINY_BENCH_OBJS = $(TINY_BENCH_OBJS_BASE)
|
TINY_BENCH_OBJS = $(TINY_BENCH_OBJS_BASE)
|
||||||
ifeq ($(POOL_TLS_PHASE1),1)
|
ifeq ($(POOL_TLS_PHASE1),1)
|
||||||
TINY_BENCH_OBJS += pool_tls.o pool_refill.o core/pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o
|
TINY_BENCH_OBJS += pool_tls.o pool_refill.o core/pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o
|
||||||
|
|||||||
@ -44,5 +44,11 @@ static void free_path_stats_dump(void) {
|
|||||||
g_free_path_stats.legacy_by_class[6],
|
g_free_path_stats.legacy_by_class[6],
|
||||||
g_free_path_stats.legacy_by_class[7]);
|
g_free_path_stats.legacy_by_class[7]);
|
||||||
|
|
||||||
|
// Phase TLS-UNIFY-3: C6 Intrusive Freelist stats
|
||||||
|
fprintf(stderr, "[FREE_PATH_STATS_C6_IFL] push=%lu pop=%lu fallback=%lu\n",
|
||||||
|
g_free_path_stats.c6_ifl_push,
|
||||||
|
g_free_path_stats.c6_ifl_pop,
|
||||||
|
g_free_path_stats.c6_ifl_fallback);
|
||||||
|
|
||||||
fflush(stderr);
|
fflush(stderr);
|
||||||
}
|
}
|
||||||
|
|||||||
@ -11,6 +11,9 @@ typedef struct FreePathStats {
|
|||||||
uint64_t c7_ultra_fast;
|
uint64_t c7_ultra_fast;
|
||||||
uint64_t c6_ultra_free_fast; // Phase 4-2: C6 ULTRA-free
|
uint64_t c6_ultra_free_fast; // Phase 4-2: C6 ULTRA-free
|
||||||
uint64_t c6_ultra_alloc_hit; // Phase 4-4: C6 ULTRA-alloc (TLS pop)
|
uint64_t c6_ultra_alloc_hit; // Phase 4-4: C6 ULTRA-alloc (TLS pop)
|
||||||
|
uint64_t c6_ifl_push; // Phase TLS-UNIFY-3: C6 intrusive push
|
||||||
|
uint64_t c6_ifl_pop; // Phase TLS-UNIFY-3: C6 intrusive pop
|
||||||
|
uint64_t c6_ifl_fallback; // Phase TLS-UNIFY-3: C6 intrusive fallback (slow)
|
||||||
uint64_t c5_ultra_free_fast; // Phase 5-1: C5 ULTRA-free
|
uint64_t c5_ultra_free_fast; // Phase 5-1: C5 ULTRA-free
|
||||||
uint64_t c5_ultra_alloc_hit; // Phase 5-2: C5 ULTRA-alloc (TLS pop)
|
uint64_t c5_ultra_alloc_hit; // Phase 5-2: C5 ULTRA-alloc (TLS pop)
|
||||||
uint64_t c4_ultra_free_fast; // Phase 6: C4 ULTRA-free (cap=64)
|
uint64_t c4_ultra_free_fast; // Phase 6: C4 ULTRA-free (cap=64)
|
||||||
|
|||||||
57
core/box/tiny_c6_intrusive_freelist_box.h
Normal file
57
core/box/tiny_c6_intrusive_freelist_box.h
Normal file
@ -0,0 +1,57 @@
|
|||||||
|
// tiny_c6_intrusive_freelist_box.h - Phase TLS-UNIFY-3: C6 Intrusive Freelist L1 Box
|
||||||
|
//
|
||||||
|
// Pure LIFO operations on intrusive freelist (header-only / static inline).
|
||||||
|
// No side effects: does NOT touch seg/owner/remote/publish/stats.
|
||||||
|
//
|
||||||
|
// IMPORTANT: All next pointer access MUST go through tiny_next_* (no direct *(void**))
|
||||||
|
//
|
||||||
|
#ifndef HAKMEM_TINY_C6_INTRUSIVE_FREELIST_BOX_H
|
||||||
|
#define HAKMEM_TINY_C6_INTRUSIVE_FREELIST_BOX_H
|
||||||
|
|
||||||
|
#include <stdbool.h>
|
||||||
|
#include <stddef.h>
|
||||||
|
#include "../tiny_nextptr.h"
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// C6 Fixed Wrappers (delegate to tiny_next_* "single source of truth")
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
// Load next pointer from freed block (at user offset = base+1)
|
||||||
|
static inline void* c6_ifl_next_load(void* base) {
|
||||||
|
return tiny_next_load(base, 6); // class_idx=6, off=1
|
||||||
|
}
|
||||||
|
|
||||||
|
// Store next pointer to freed block (at user offset = base+1)
|
||||||
|
static inline void c6_ifl_next_store(void* base, void* next) {
|
||||||
|
tiny_next_store(base, 6, next); // class_idx=6, off=1
|
||||||
|
}
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Pure LIFO Operations (no side effects)
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
// Push base to intrusive LIFO head
|
||||||
|
// Caller is responsible for count update and stats
|
||||||
|
static inline void c6_ifl_push(void** head, void* base) {
|
||||||
|
c6_ifl_next_store(base, *head);
|
||||||
|
*head = base;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Pop from intrusive LIFO head
|
||||||
|
// Returns NULL if empty
|
||||||
|
// Caller is responsible for count update and stats
|
||||||
|
static inline void* c6_ifl_pop(void** head) {
|
||||||
|
void* base = *head;
|
||||||
|
if (base == NULL) {
|
||||||
|
return NULL;
|
||||||
|
}
|
||||||
|
*head = c6_ifl_next_load(base);
|
||||||
|
return base;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check if LIFO is empty
|
||||||
|
static inline bool c6_ifl_is_empty(void* head) {
|
||||||
|
return head == NULL;
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif // HAKMEM_TINY_C6_INTRUSIVE_FREELIST_BOX_H
|
||||||
22
core/box/tiny_c6_ultra_intrusive_env_box.h
Normal file
22
core/box/tiny_c6_ultra_intrusive_env_box.h
Normal file
@ -0,0 +1,22 @@
|
|||||||
|
// tiny_c6_ultra_intrusive_env_box.h - Phase TLS-UNIFY-3: C6 Intrusive FL ENV gate
|
||||||
|
//
|
||||||
|
// ENV: HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL (default OFF)
|
||||||
|
// Separate from existing HAKMEM_TINY_C6_ULTRA_FREE_ENABLED
|
||||||
|
//
|
||||||
|
#ifndef HAKMEM_TINY_C6_ULTRA_INTRUSIVE_ENV_BOX_H
|
||||||
|
#define HAKMEM_TINY_C6_ULTRA_INTRUSIVE_ENV_BOX_H
|
||||||
|
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <stdbool.h>
|
||||||
|
|
||||||
|
// Cached ENV gate (read once on first call)
|
||||||
|
static inline bool tiny_c6_ultra_intrusive_enabled(void) {
|
||||||
|
static int g_enabled = -1; // -1 = not initialized
|
||||||
|
if (g_enabled < 0) {
|
||||||
|
const char* env = getenv("HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL");
|
||||||
|
g_enabled = (env && env[0] == '1') ? 1 : 0;
|
||||||
|
}
|
||||||
|
return g_enabled == 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif // HAKMEM_TINY_C6_ULTRA_INTRUSIVE_ENV_BOX_H
|
||||||
211
core/box/tiny_ultra_tls_box.c
Normal file
211
core/box/tiny_ultra_tls_box.c
Normal file
@ -0,0 +1,211 @@
|
|||||||
|
// tiny_ultra_tls_box.c - Phase TLS-UNIFY-2a + TLS-UNIFY-3: Unified ULTRA TLS implementation
|
||||||
|
//
|
||||||
|
// Phase 1: Thin wrapper delegating to per-class TLS (completed)
|
||||||
|
// Phase 2a: Unified struct with array magazines for C4-C6 (completed)
|
||||||
|
// C7 remains in separate TinyC7Ultra box.
|
||||||
|
// Phase 3: C6 intrusive LIFO (current) - ENV gated
|
||||||
|
//
|
||||||
|
|
||||||
|
#include "tiny_ultra_tls_box.h"
|
||||||
|
#include "tiny_c7_ultra_box.h"
|
||||||
|
#include "free_path_stats_box.h"
|
||||||
|
#include "tiny_c6_ultra_intrusive_env_box.h" // Phase 3: ENV gate
|
||||||
|
#include "tiny_c6_intrusive_freelist_box.h" // Phase 3: L1 box
|
||||||
|
#include "../superslab/superslab_inline.h" // For ss_fast_lookup
|
||||||
|
#include "../tiny_debug_ring.h" // For ring visualization
|
||||||
|
|
||||||
|
#ifndef likely
|
||||||
|
#define likely(x) __builtin_expect(!!(x), 1)
|
||||||
|
#define unlikely(x) __builtin_expect(!!(x), 0)
|
||||||
|
#endif
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Phase TLS-UNIFY-2a: Unified TLS context for C4-C6
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
static __thread TinyUltraTlsCtx g_ultra_tls_ctx = {0};
|
||||||
|
|
||||||
|
TinyUltraTlsCtx* tiny_ultra_tls_ctx(void) {
|
||||||
|
return &g_ultra_tls_ctx;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Phase TLS-UNIFY-2a: Pop from unified TLS (C4-C6) or C7 separate box
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
void* tiny_ultra_tls_pop(uint8_t class_idx) {
|
||||||
|
TinyUltraTlsCtx* ctx = &g_ultra_tls_ctx;
|
||||||
|
|
||||||
|
switch (class_idx) {
|
||||||
|
case 4:
|
||||||
|
if (likely(ctx->c4_count > 0)) {
|
||||||
|
return ctx->c4_freelist[--ctx->c4_count];
|
||||||
|
}
|
||||||
|
return NULL;
|
||||||
|
|
||||||
|
case 5:
|
||||||
|
if (likely(ctx->c5_count > 0)) {
|
||||||
|
return ctx->c5_freelist[--ctx->c5_count];
|
||||||
|
}
|
||||||
|
return NULL;
|
||||||
|
|
||||||
|
case 6:
|
||||||
|
if (tiny_c6_ultra_intrusive_enabled()) {
|
||||||
|
// Phase 3: intrusive LIFO
|
||||||
|
void* base = c6_ifl_pop(&ctx->c6_head);
|
||||||
|
if (base) {
|
||||||
|
ctx->c6_count--;
|
||||||
|
FREE_PATH_STAT_INC(c6_ifl_pop);
|
||||||
|
tiny_debug_ring_record(TINY_RING_EVENT_C6_IFL_POP, 6,
|
||||||
|
(uintptr_t)base, ctx->c6_count);
|
||||||
|
} else {
|
||||||
|
tiny_debug_ring_record(TINY_RING_EVENT_C6_IFL_EMPTY, 6,
|
||||||
|
0, ctx->c6_count);
|
||||||
|
}
|
||||||
|
return base;
|
||||||
|
} else {
|
||||||
|
// Fallback: array magazine
|
||||||
|
if (likely(ctx->c6_count > 0)) {
|
||||||
|
return ctx->c6_freelist[--ctx->c6_count];
|
||||||
|
}
|
||||||
|
return NULL;
|
||||||
|
}
|
||||||
|
|
||||||
|
case 7: {
|
||||||
|
// C7 uses separate TinyC7Ultra box (not unified)
|
||||||
|
tiny_c7_ultra_tls_t* c7ctx = tiny_c7_ultra_tls_get();
|
||||||
|
if (likely(c7ctx->count > 0)) {
|
||||||
|
return c7ctx->freelist[--c7ctx->count];
|
||||||
|
}
|
||||||
|
return NULL;
|
||||||
|
}
|
||||||
|
|
||||||
|
default:
|
||||||
|
return NULL;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Phase TLS-UNIFY-2a: Push to unified TLS (C4-C6) or C7 separate box
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
// Forward declaration for slow path
|
||||||
|
extern void so_free(int class_idx, void* ptr);
|
||||||
|
|
||||||
|
// Slow path: flush half of TLS cache and push to segment
|
||||||
|
static void tiny_ultra_tls_push_slow(uint8_t class_idx, void* base) {
|
||||||
|
// Convert BASE to USER pointer for so_free
|
||||||
|
void* user_ptr = (uint8_t*)base + 1;
|
||||||
|
so_free(class_idx, user_ptr);
|
||||||
|
}
|
||||||
|
|
||||||
|
void tiny_ultra_tls_push(uint8_t class_idx, void* base) {
|
||||||
|
TinyUltraTlsCtx* ctx = &g_ultra_tls_ctx;
|
||||||
|
uintptr_t addr = (uintptr_t)base;
|
||||||
|
|
||||||
|
switch (class_idx) {
|
||||||
|
case 4:
|
||||||
|
// Learn segment on first C4 free
|
||||||
|
if (unlikely(ctx->c4_seg_base == 0)) {
|
||||||
|
SuperSlab* ss = ss_fast_lookup(base);
|
||||||
|
if (ss != NULL) {
|
||||||
|
ctx->c4_seg_base = (uintptr_t)ss;
|
||||||
|
ctx->c4_seg_end = ctx->c4_seg_base + (1u << ss->lg_size);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// Check segment range and capacity
|
||||||
|
if (likely(ctx->c4_seg_base != 0 &&
|
||||||
|
addr >= ctx->c4_seg_base &&
|
||||||
|
addr < ctx->c4_seg_end &&
|
||||||
|
ctx->c4_count < TINY_ULTRA_C4_CAP)) {
|
||||||
|
ctx->c4_freelist[ctx->c4_count++] = base;
|
||||||
|
FREE_PATH_STAT_INC(c4_ultra_free_fast);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
tiny_ultra_tls_push_slow(class_idx, base);
|
||||||
|
break;
|
||||||
|
|
||||||
|
case 5:
|
||||||
|
// Learn segment on first C5 free
|
||||||
|
if (unlikely(ctx->c5_seg_base == 0)) {
|
||||||
|
SuperSlab* ss = ss_fast_lookup(base);
|
||||||
|
if (ss != NULL) {
|
||||||
|
ctx->c5_seg_base = (uintptr_t)ss;
|
||||||
|
ctx->c5_seg_end = ctx->c5_seg_base + (1u << ss->lg_size);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (likely(ctx->c5_seg_base != 0 &&
|
||||||
|
addr >= ctx->c5_seg_base &&
|
||||||
|
addr < ctx->c5_seg_end &&
|
||||||
|
ctx->c5_count < TINY_ULTRA_C5_CAP)) {
|
||||||
|
ctx->c5_freelist[ctx->c5_count++] = base;
|
||||||
|
FREE_PATH_STAT_INC(c5_ultra_free_fast);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
tiny_ultra_tls_push_slow(class_idx, base);
|
||||||
|
break;
|
||||||
|
|
||||||
|
case 6:
|
||||||
|
// Learn segment on first C6 free (common for both modes)
|
||||||
|
if (unlikely(ctx->c6_seg_base == 0)) {
|
||||||
|
SuperSlab* ss = ss_fast_lookup(base);
|
||||||
|
if (ss != NULL) {
|
||||||
|
ctx->c6_seg_base = (uintptr_t)ss;
|
||||||
|
ctx->c6_seg_end = ctx->c6_seg_base + (1u << ss->lg_size);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// Check segment range and capacity (common)
|
||||||
|
if (likely(ctx->c6_seg_base != 0 &&
|
||||||
|
addr >= ctx->c6_seg_base &&
|
||||||
|
addr < ctx->c6_seg_end &&
|
||||||
|
ctx->c6_count < TINY_ULTRA_C6_CAP)) {
|
||||||
|
if (tiny_c6_ultra_intrusive_enabled()) {
|
||||||
|
// Phase 3: intrusive LIFO
|
||||||
|
c6_ifl_push(&ctx->c6_head, base);
|
||||||
|
ctx->c6_count++;
|
||||||
|
FREE_PATH_STAT_INC(c6_ifl_push);
|
||||||
|
FREE_PATH_STAT_INC(c6_ultra_free_fast);
|
||||||
|
tiny_debug_ring_record(TINY_RING_EVENT_C6_IFL_PUSH, 6,
|
||||||
|
(uintptr_t)base, ctx->c6_count);
|
||||||
|
} else {
|
||||||
|
// Fallback: array magazine
|
||||||
|
ctx->c6_freelist[ctx->c6_count++] = base;
|
||||||
|
FREE_PATH_STAT_INC(c6_ultra_free_fast);
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
// Slow path (range out or cap exceeded)
|
||||||
|
if (tiny_c6_ultra_intrusive_enabled()) {
|
||||||
|
FREE_PATH_STAT_INC(c6_ifl_fallback);
|
||||||
|
}
|
||||||
|
tiny_ultra_tls_push_slow(class_idx, base);
|
||||||
|
break;
|
||||||
|
|
||||||
|
case 7: {
|
||||||
|
// C7 uses separate TinyC7Ultra box (not unified)
|
||||||
|
tiny_c7_ultra_tls_t* c7ctx = tiny_c7_ultra_tls_get();
|
||||||
|
if (unlikely(c7ctx->seg_base == 0)) {
|
||||||
|
SuperSlab* ss = ss_fast_lookup(base);
|
||||||
|
if (ss != NULL) {
|
||||||
|
c7ctx->seg_base = (uintptr_t)ss;
|
||||||
|
c7ctx->seg_end = c7ctx->seg_base + (1u << ss->lg_size);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (likely(c7ctx->seg_base != 0 &&
|
||||||
|
addr >= c7ctx->seg_base &&
|
||||||
|
addr < c7ctx->seg_end &&
|
||||||
|
c7ctx->count < TINY_C7_ULTRA_CAP)) {
|
||||||
|
c7ctx->freelist[c7ctx->count++] = base;
|
||||||
|
FREE_PATH_STAT_INC(c7_ultra_fast);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
// Slow path for C7
|
||||||
|
void* user_ptr = (uint8_t*)base + 1;
|
||||||
|
so_free(7, user_ptr);
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
default:
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
78
core/box/tiny_ultra_tls_box.h
Normal file
78
core/box/tiny_ultra_tls_box.h
Normal file
@ -0,0 +1,78 @@
|
|||||||
|
// tiny_ultra_tls_box.h - Phase TLS-UNIFY-1: Unified ULTRA TLS API
|
||||||
|
//
|
||||||
|
// Goal: Single API for C4-C7 ULTRA TLS operations
|
||||||
|
// Phase 1: Thin wrapper delegating to existing TinyC*UltraFreeTLS
|
||||||
|
// Phase 2: Replace with unified struct (1 cache line hot path)
|
||||||
|
//
|
||||||
|
#ifndef HAKMEM_TINY_ULTRA_TLS_BOX_H
|
||||||
|
#define HAKMEM_TINY_ULTRA_TLS_BOX_H
|
||||||
|
|
||||||
|
#include <stdint.h>
|
||||||
|
#include <stdbool.h>
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// TinyUltraTlsCtx - Unified TLS context (Phase TLS-UNIFY-2a + TLS-UNIFY-3)
|
||||||
|
// ============================================================================
|
||||||
|
//
|
||||||
|
// Phase 1: Thin wrapper delegating to per-class TLS (completed)
|
||||||
|
// Phase 2a: Unified struct with array magazines for C4-C6 (completed)
|
||||||
|
// C7 remains in separate TinyC7Ultra box.
|
||||||
|
// Phase 3: C6 intrusive LIFO (current) - ENV gated
|
||||||
|
//
|
||||||
|
// Capacity constants
|
||||||
|
#define TINY_ULTRA_C4_CAP 64
|
||||||
|
#define TINY_ULTRA_C5_CAP 64
|
||||||
|
#define TINY_ULTRA_C6_CAP 128
|
||||||
|
|
||||||
|
typedef struct TinyUltraTlsCtx {
|
||||||
|
// Hot line: counts (8B aligned)
|
||||||
|
uint16_t c4_count;
|
||||||
|
uint16_t c5_count;
|
||||||
|
uint16_t c6_count;
|
||||||
|
uint16_t _pad_count;
|
||||||
|
|
||||||
|
// C6 intrusive LIFO head (Phase TLS-UNIFY-3)
|
||||||
|
// Used when HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL=1
|
||||||
|
void* c6_head;
|
||||||
|
|
||||||
|
// Per-class segment ranges (learned on first free)
|
||||||
|
uintptr_t c4_seg_base;
|
||||||
|
uintptr_t c4_seg_end;
|
||||||
|
uintptr_t c5_seg_base;
|
||||||
|
uintptr_t c5_seg_end;
|
||||||
|
uintptr_t c6_seg_base;
|
||||||
|
uintptr_t c6_seg_end;
|
||||||
|
|
||||||
|
// Per-class array magazines (C4/C5 always, C6 when intrusive OFF)
|
||||||
|
void* c4_freelist[TINY_ULTRA_C4_CAP]; // 512B
|
||||||
|
void* c5_freelist[TINY_ULTRA_C5_CAP]; // 512B
|
||||||
|
void* c6_freelist[TINY_ULTRA_C6_CAP]; // 1024B (kept for ENV_OFF fallback)
|
||||||
|
// Total: ~2KB per thread (acceptable for array magazine design)
|
||||||
|
|
||||||
|
// Note: C7 is NOT included here - uses separate TinyC7Ultra box
|
||||||
|
} TinyUltraTlsCtx;
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Unified API
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
// Get TLS context (Phase 1: returns dummy, Phase 2: returns actual unified ctx)
|
||||||
|
TinyUltraTlsCtx* tiny_ultra_tls_ctx(void);
|
||||||
|
|
||||||
|
// Pop BASE pointer from TLS freelist (C4-C7)
|
||||||
|
// Returns: BASE pointer on hit, NULL on miss (caller should fallback)
|
||||||
|
// class_idx: 4, 5, 6, or 7
|
||||||
|
void* tiny_ultra_tls_pop(uint8_t class_idx);
|
||||||
|
|
||||||
|
// Push BASE pointer to TLS freelist (C4-C7)
|
||||||
|
// class_idx: 4, 5, 6, or 7
|
||||||
|
// base: BASE pointer (not user pointer)
|
||||||
|
void tiny_ultra_tls_push(uint8_t class_idx, void* base);
|
||||||
|
|
||||||
|
// Check if unified TLS is enabled (ENV gate)
|
||||||
|
static inline int tiny_ultra_tls_unified_enabled(void) {
|
||||||
|
// Phase 1: Always enabled (thin wrapper mode)
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif // HAKMEM_TINY_ULTRA_TLS_BOX_H
|
||||||
@ -52,6 +52,7 @@
|
|||||||
#include "../box/tiny_c6_ultra_free_box.h" // Phase 4-2: C6 ULTRA-free (free-only, C6-only)
|
#include "../box/tiny_c6_ultra_free_box.h" // Phase 4-2: C6 ULTRA-free (free-only, C6-only)
|
||||||
#include "../box/tiny_c5_ultra_free_box.h" // Phase 5-1/5-2: C5 ULTRA-free + alloc integration
|
#include "../box/tiny_c5_ultra_free_box.h" // Phase 5-1/5-2: C5 ULTRA-free + alloc integration
|
||||||
#include "../box/tiny_c4_ultra_free_box.h" // Phase 6: C4 ULTRA-free + alloc integration (cap=64)
|
#include "../box/tiny_c4_ultra_free_box.h" // Phase 6: C4 ULTRA-free + alloc integration (cap=64)
|
||||||
|
#include "../box/tiny_ultra_tls_box.h" // Phase TLS-UNIFY-1: Unified ULTRA TLS API
|
||||||
#include "../box/tiny_ultra_classes_box.h" // Phase REFACTOR-1: Named constants for C4-C7
|
#include "../box/tiny_ultra_classes_box.h" // Phase REFACTOR-1: Named constants for C4-C7
|
||||||
#include "../box/tiny_legacy_fallback_box.h" // Phase REFACTOR-2: Legacy fallback logic unification
|
#include "../box/tiny_legacy_fallback_box.h" // Phase REFACTOR-2: Legacy fallback logic unification
|
||||||
#include "../box/tiny_ptr_convert_box.h" // Phase REFACTOR-3: Inline pointer macro centralization
|
#include "../box/tiny_ptr_convert_box.h" // Phase REFACTOR-3: Inline pointer macro centralization
|
||||||
@ -130,175 +131,84 @@ static inline void* malloc_tiny_fast(size_t size) {
|
|||||||
// Phase ALLOC-GATE-OPT-1: カウンタ散布 (1. 関数入口)
|
// Phase ALLOC-GATE-OPT-1: カウンタ散布 (1. 関数入口)
|
||||||
ALLOC_GATE_STAT_INC(total_calls);
|
ALLOC_GATE_STAT_INC(total_calls);
|
||||||
|
|
||||||
const int front_v3_on = tiny_front_v3_enabled();
|
// Phase v11a-5: Simplified hot path with C7 ULTRA early-exit
|
||||||
const TinyFrontV3Snapshot* front_snap =
|
// 1. size → class_idx (single call)
|
||||||
__builtin_expect(front_v3_on, 0) ? tiny_front_v3_snapshot_get() : NULL;
|
ALLOC_GATE_STAT_INC(size_to_class_calls);
|
||||||
const bool route_fast_on = front_v3_on && tiny_front_v3_lut_enabled() &&
|
int class_idx = hak_tiny_size_to_class(size);
|
||||||
tiny_front_v3_route_fast_enabled();
|
|
||||||
|
|
||||||
int class_idx = -1;
|
|
||||||
tiny_route_kind_t route = TINY_ROUTE_LEGACY;
|
|
||||||
bool route_trusted = false;
|
|
||||||
|
|
||||||
if (front_v3_on && tiny_front_v3_lut_enabled()) {
|
|
||||||
const TinyFrontV3SizeClassEntry* e = tiny_front_v3_lut_lookup(size);
|
|
||||||
if (e && e->class_idx != TINY_FRONT_V3_INVALID_CLASS) {
|
|
||||||
class_idx = (int)e->class_idx;
|
|
||||||
route = (tiny_route_kind_t)e->route_kind;
|
|
||||||
route_trusted = route_fast_on;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
if (__builtin_expect(class_idx < 0 || class_idx >= TINY_NUM_CLASSES, 0)) {
|
if (__builtin_expect(class_idx < 0 || class_idx >= TINY_NUM_CLASSES, 0)) {
|
||||||
// Phase ALLOC-GATE-OPT-1: カウンタ散布 (2. size→class 変換)
|
return NULL;
|
||||||
ALLOC_GATE_STAT_INC(size_to_class_calls);
|
|
||||||
class_idx = hak_tiny_size_to_class(size);
|
|
||||||
if (__builtin_expect(class_idx < 0 || class_idx >= TINY_NUM_CLASSES, 0)) {
|
|
||||||
return NULL;
|
|
||||||
}
|
|
||||||
// Phase ALLOC-GATE-OPT-1: カウンタ散布 (3. route_for_class 呼び出し)
|
|
||||||
ALLOC_GATE_STAT_INC(route_for_class_calls);
|
|
||||||
route = tiny_route_for_class((uint8_t)class_idx);
|
|
||||||
route_trusted = false;
|
|
||||||
} else if (!route_trusted &&
|
|
||||||
route != TINY_ROUTE_LEGACY && route != TINY_ROUTE_HEAP &&
|
|
||||||
route != TINY_ROUTE_HOTHEAP_V2 &&
|
|
||||||
route != TINY_ROUTE_SMALL_HEAP_V6 && route != TINY_ROUTE_SMALL_HEAP_V7) {
|
|
||||||
// Phase ALLOC-GATE-OPT-1: カウンタ散布 (3. route_for_class 呼び出し)
|
|
||||||
// Note: v3/v4/v5 removed in Phase v10
|
|
||||||
ALLOC_GATE_STAT_INC(route_for_class_calls);
|
|
||||||
route = tiny_route_for_class((uint8_t)class_idx);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
tiny_front_alloc_stat_inc(class_idx);
|
tiny_front_alloc_stat_inc(class_idx);
|
||||||
|
|
||||||
// Phase ALLOC-GATE-OPT-1: カウンタ散布 (4. クラス別分布)
|
|
||||||
ALLOC_GATE_STAT_INC_CLASS(class_idx);
|
ALLOC_GATE_STAT_INC_CLASS(class_idx);
|
||||||
|
|
||||||
// C7 ULTRA allocation path (ENV: HAKMEM_TINY_C7_ULTRA_ENABLED, default ON)
|
// Phase v11a-5b: C7 ULTRA early-exit (skip policy snapshot for common case)
|
||||||
if (tiny_class_is_c7(class_idx) && tiny_c7_ultra_enabled_env()) {
|
// This is the most common hot path - avoids TLS policy overhead
|
||||||
// Phase ALLOC-GATE-OPT-1: カウンタ散布 (5. C7 ULTRA ENV check)
|
if (class_idx == 7 && tiny_c7_ultra_enabled_env()) {
|
||||||
ALLOC_GATE_STAT_INC(env_checks);
|
|
||||||
void* ultra_p = tiny_c7_ultra_alloc(size);
|
void* ultra_p = tiny_c7_ultra_alloc(size);
|
||||||
if (TINY_HOT_LIKELY(ultra_p != NULL)) {
|
if (TINY_HOT_LIKELY(ultra_p != NULL)) {
|
||||||
return ultra_p;
|
return ultra_p;
|
||||||
}
|
}
|
||||||
// fallback to route on miss
|
// C7 ULTRA miss → fall through to policy-based routing
|
||||||
}
|
}
|
||||||
|
|
||||||
// Phase 4-4: C6 ULTRA free+alloc integration (寄生型 TLS キャッシュ pop)
|
// 2. Policy snapshot (TLS cached, single read)
|
||||||
if (tiny_class_is_c6(class_idx) && tiny_c6_ultra_free_enabled()) {
|
|
||||||
// Phase ALLOC-GATE-OPT-1: カウンタ散布 (5. C6 ULTRA ENV check)
|
|
||||||
ALLOC_GATE_STAT_INC(env_checks);
|
|
||||||
TinyC6UltraFreeTLS* ctx = tiny_c6_ultra_free_tls();
|
|
||||||
if (TINY_HOT_LIKELY(ctx->count > 0)) {
|
|
||||||
void* base = ctx->freelist[--ctx->count];
|
|
||||||
// Phase 4-4: カウンタ散布 (TLS pop)
|
|
||||||
FREE_PATH_STAT_INC(c6_ultra_alloc_hit);
|
|
||||||
// BASE pointer のまま、USER pointer に変換して返す
|
|
||||||
// (header は既に base[0] にある前提)
|
|
||||||
return tiny_base_to_user_inline(base);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Phase 5-2: C5 ULTRA free+alloc integration (same pattern as C6)
|
|
||||||
if (tiny_class_is_c5(class_idx) && tiny_c5_ultra_free_enabled()) {
|
|
||||||
// Phase ALLOC-GATE-OPT-1: カウンタ散布 (5. C5 ULTRA ENV check)
|
|
||||||
ALLOC_GATE_STAT_INC(env_checks);
|
|
||||||
TinyC5UltraFreeTLS* ctx = tiny_c5_ultra_free_tls();
|
|
||||||
if (TINY_HOT_LIKELY(ctx->count > 0)) {
|
|
||||||
void* base = ctx->freelist[--ctx->count];
|
|
||||||
FREE_PATH_STAT_INC(c5_ultra_alloc_hit);
|
|
||||||
return tiny_base_to_user_inline(base);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Phase 6: C4 ULTRA free+alloc integration (same pattern as C5/C6, cap=64)
|
|
||||||
if (tiny_class_is_c4(class_idx) && tiny_c4_ultra_free_enabled()) {
|
|
||||||
// Phase ALLOC-GATE-OPT-1: カウンタ散布 (5. C4 ULTRA ENV check)
|
|
||||||
ALLOC_GATE_STAT_INC(env_checks);
|
|
||||||
TinyC4UltraFreeTLS* ctx = tiny_c4_ultra_free_tls();
|
|
||||||
if (TINY_HOT_LIKELY(ctx->count > 0)) {
|
|
||||||
void* base = ctx->freelist[--ctx->count];
|
|
||||||
FREE_PATH_STAT_INC(c4_ultra_alloc_hit);
|
|
||||||
return tiny_base_to_user_inline(base);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Phase v7-4: Check Policy Box for v7/MID_V35 routing (before switch)
|
|
||||||
const SmallPolicyV7* policy = small_policy_v7_snapshot();
|
const SmallPolicyV7* policy = small_policy_v7_snapshot();
|
||||||
|
SmallRouteKind route_kind = policy->route_kind[class_idx];
|
||||||
|
|
||||||
// Phase v11a-3: MID v3.5 routing
|
// 3. Single switch on route_kind (all ENV checks moved to Policy init)
|
||||||
if (policy->route_kind[class_idx] == SMALL_ROUTE_MID_V35) {
|
switch (route_kind) {
|
||||||
void* v35p = small_mid_v35_alloc(class_idx, size);
|
case SMALL_ROUTE_ULTRA: {
|
||||||
if (TINY_HOT_LIKELY(v35p != NULL)) {
|
// Phase TLS-UNIFY-1: Unified ULTRA TLS pop for C4-C6 (C7 handled above)
|
||||||
return v35p;
|
void* base = tiny_ultra_tls_pop((uint8_t)class_idx);
|
||||||
|
if (TINY_HOT_LIKELY(base != NULL)) {
|
||||||
|
if (class_idx == 6) FREE_PATH_STAT_INC(c6_ultra_alloc_hit);
|
||||||
|
else if (class_idx == 5) FREE_PATH_STAT_INC(c5_ultra_alloc_hit);
|
||||||
|
else if (class_idx == 4) FREE_PATH_STAT_INC(c4_ultra_alloc_hit);
|
||||||
|
return tiny_base_to_user_inline(base);
|
||||||
|
}
|
||||||
|
// ULTRA miss → fallback to LEGACY
|
||||||
|
break;
|
||||||
}
|
}
|
||||||
// v35 returned NULL -> fallback to legacy
|
|
||||||
}
|
|
||||||
|
|
||||||
if (policy->route_kind[class_idx] == SMALL_ROUTE_V7) {
|
case SMALL_ROUTE_MID_V35: {
|
||||||
void* v7p = small_heap_alloc_fast_v7_stub(size, (uint8_t)class_idx);
|
// Phase v11a-3: MID v3.5 allocation
|
||||||
if (TINY_HOT_LIKELY(v7p != NULL)) {
|
void* v35p = small_mid_v35_alloc(class_idx, size);
|
||||||
return v7p;
|
if (TINY_HOT_LIKELY(v35p != NULL)) {
|
||||||
|
return v35p;
|
||||||
|
}
|
||||||
|
// MID v3.5 miss → fallback to LEGACY
|
||||||
|
break;
|
||||||
}
|
}
|
||||||
// v7 stub returned NULL -> fallback to legacy
|
|
||||||
}
|
|
||||||
|
|
||||||
switch (route) {
|
case SMALL_ROUTE_V7: {
|
||||||
case TINY_ROUTE_SMALL_HEAP_V7: {
|
// Phase v7: SmallObject v7 allocation (research box)
|
||||||
// Phase v7-4: v7 routing now handled by Policy Box above (kept for legacy compatibility)
|
|
||||||
void* v7p = small_heap_alloc_fast_v7_stub(size, (uint8_t)class_idx);
|
void* v7p = small_heap_alloc_fast_v7_stub(size, (uint8_t)class_idx);
|
||||||
if (TINY_HOT_LIKELY(v7p != NULL)) {
|
if (TINY_HOT_LIKELY(v7p != NULL)) {
|
||||||
return v7p;
|
return v7p;
|
||||||
}
|
}
|
||||||
// v7 stub returned NULL -> fallback to legacy
|
// V7 miss → fallback to LEGACY
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
case TINY_ROUTE_SMALL_HEAP_V6: {
|
|
||||||
// Phase V6-HDR-2: Headerless alloc (ENV gated)
|
case SMALL_ROUTE_MID_V3: {
|
||||||
if (small_v6_headerless_route_enabled((uint8_t)class_idx)) {
|
// Phase MID-V3: MID v3 allocation (257-768B, C5-C6)
|
||||||
SmallHeapCtxV6* ctx_v6 = small_heap_ctx_v6();
|
// Note: MID v3 uses same segment infrastructure as MID v3.5
|
||||||
void* v6p = small_v6_headerless_alloc(ctx_v6, (uint8_t)class_idx);
|
// For now, delegate to MID v3.5 which handles both
|
||||||
if (TINY_HOT_LIKELY(v6p != NULL)) {
|
void* v3p = small_mid_v35_alloc(class_idx, size);
|
||||||
return v6p; // No header write needed - done in refill
|
if (TINY_HOT_LIKELY(v3p != NULL)) {
|
||||||
}
|
return v3p;
|
||||||
// v6 returned NULL -> fallback to legacy
|
|
||||||
}
|
|
||||||
__attribute__((fallthrough));
|
|
||||||
}
|
|
||||||
// Phase v10: v3/v4/v5 removed - routes now handled as LEGACY
|
|
||||||
case TINY_ROUTE_HOTHEAP_V2: {
|
|
||||||
void* v2p = tiny_hotheap_v2_alloc((uint8_t)class_idx);
|
|
||||||
if (TINY_HOT_LIKELY(v2p != NULL)) {
|
|
||||||
return v2p;
|
|
||||||
}
|
|
||||||
tiny_hotheap_v2_record_route_fallback((uint8_t)class_idx);
|
|
||||||
// fallthrough to TinyHeap v1
|
|
||||||
__attribute__((fallthrough));
|
|
||||||
}
|
|
||||||
case TINY_ROUTE_HEAP: {
|
|
||||||
void* heap_ptr = NULL;
|
|
||||||
if (class_idx == 7) {
|
|
||||||
heap_ptr = tiny_c7_alloc_fast(size);
|
|
||||||
} else {
|
|
||||||
heap_ptr = tiny_heap_alloc_class_fast(tiny_heap_ctx_for_thread(), class_idx, size);
|
|
||||||
}
|
|
||||||
if (heap_ptr) {
|
|
||||||
return heap_ptr;
|
|
||||||
}
|
}
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
case TINY_ROUTE_LEGACY:
|
|
||||||
|
case SMALL_ROUTE_LEGACY:
|
||||||
default:
|
default:
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Legacy Tiny front
|
// LEGACY fallback: Unified Cache hot/cold path
|
||||||
void* ptr = NULL;
|
void* ptr = tiny_hot_alloc_fast(class_idx);
|
||||||
if (!front_snap || front_snap->unified_cache_on) {
|
|
||||||
ptr = tiny_hot_alloc_fast(class_idx);
|
|
||||||
}
|
|
||||||
if (TINY_HOT_LIKELY(ptr != NULL)) {
|
if (TINY_HOT_LIKELY(ptr != NULL)) {
|
||||||
return ptr;
|
return ptr;
|
||||||
}
|
}
|
||||||
@ -353,85 +263,58 @@ static inline int free_tiny_fast(void* ptr) {
|
|||||||
// Phase FREE-LEGACY-BREAKDOWN-1: カウンタ散布 (1. 関数入口)
|
// Phase FREE-LEGACY-BREAKDOWN-1: カウンタ散布 (1. 関数入口)
|
||||||
FREE_PATH_STAT_INC(total_calls);
|
FREE_PATH_STAT_INC(total_calls);
|
||||||
|
|
||||||
// C7 ULTRA free path (ENV: HAKMEM_TINY_C7_ULTRA_ENABLED, default ON)
|
// Phase v11b-1: C7 ULTRA early-exit (skip policy snapshot for most common case)
|
||||||
if (tiny_class_is_c7(class_idx) && tiny_c7_ultra_enabled_env()) {
|
if (class_idx == 7 && tiny_c7_ultra_enabled_env()) {
|
||||||
tiny_c7_ultra_free(ptr);
|
tiny_c7_ultra_free(ptr);
|
||||||
return 1;
|
return 1;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Phase 4-2: C6 ULTRA-free (C6-only, free-only, ENV gated)
|
// Phase v11b-1: Policy-based single switch (replaces serial ULTRA checks)
|
||||||
if (tiny_class_is_c6(class_idx) && tiny_c6_ultra_free_enabled()) {
|
|
||||||
tiny_c6_ultra_free_fast(base, class_idx);
|
|
||||||
return 1;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Phase 5-1: C5 ULTRA-free (C5-only, free-only, ENV gated, same pattern as C6)
|
|
||||||
if (tiny_class_is_c5(class_idx) && tiny_c5_ultra_free_enabled()) {
|
|
||||||
tiny_c5_ultra_free_fast(base, class_idx);
|
|
||||||
return 1;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Phase 6: C4 ULTRA-free (C4-only, free-only, ENV gated, same pattern as C5/C6)
|
|
||||||
if (tiny_class_is_c4(class_idx) && tiny_c4_ultra_free_enabled()) {
|
|
||||||
tiny_c4_ultra_free_fast(base, class_idx);
|
|
||||||
return 1;
|
|
||||||
}
|
|
||||||
|
|
||||||
// C7 v3 fast classify: bypass classify_ptr/ss_map_lookup for clear hits
|
|
||||||
if (class_idx == 7 &&
|
|
||||||
tiny_front_v3_enabled() &&
|
|
||||||
tiny_ptr_fast_classify_enabled() &&
|
|
||||||
small_heap_v3_c7_enabled() &&
|
|
||||||
smallobject_hotbox_v3_can_own_c7(base)) {
|
|
||||||
so_free(7, base);
|
|
||||||
// Phase FREE-LEGACY-BREAKDOWN-1: カウンタ散布 (3. C7 v3 fast classify)
|
|
||||||
FREE_PATH_STAT_INC(smallheap_v3_fast);
|
|
||||||
return 1;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Phase v11a-3: Try MID v3.5 free for C5/C6/C7
|
|
||||||
// For v11a-3: simple ownership check (assumes current route)
|
|
||||||
// For v11b: will add proper segment-based ownership check
|
|
||||||
const SmallPolicyV7* policy_free = small_policy_v7_snapshot();
|
const SmallPolicyV7* policy_free = small_policy_v7_snapshot();
|
||||||
if ((class_idx >= 5 && class_idx <= 7) &&
|
SmallRouteKind route_kind_free = policy_free->route_kind[class_idx];
|
||||||
policy_free->route_kind[class_idx] == SMALL_ROUTE_MID_V35) {
|
|
||||||
small_mid_v35_free(ptr, class_idx);
|
|
||||||
FREE_PATH_STAT_INC(smallheap_v7_fast); // Reuse counter for now
|
|
||||||
return 1;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Phase v7-5b/v7-7: Always try V7 free for supported classes (C5/C6)
|
switch (route_kind_free) {
|
||||||
// V7 returns false if ptr is not in V7 segment.
|
case SMALL_ROUTE_ULTRA: {
|
||||||
// This is necessary because Learner may switch routes dynamically,
|
// Phase TLS-UNIFY-1: Unified ULTRA TLS push for C4-C6 (C7 handled above)
|
||||||
// but pointers allocated before the switch still need V7 free.
|
if (class_idx >= 4 && class_idx <= 6) {
|
||||||
if (SMALL_V7_CLASS_SUPPORTED(class_idx)) {
|
tiny_ultra_tls_push((uint8_t)class_idx, base);
|
||||||
if (small_heap_free_fast_v7_stub(ptr, (uint8_t)class_idx)) {
|
return 1;
|
||||||
|
}
|
||||||
|
// ULTRA for other classes → fallback to LEGACY
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
case SMALL_ROUTE_MID_V35: {
|
||||||
|
// Phase v11a-3: MID v3.5 free
|
||||||
|
small_mid_v35_free(ptr, class_idx);
|
||||||
FREE_PATH_STAT_INC(smallheap_v7_fast);
|
FREE_PATH_STAT_INC(smallheap_v7_fast);
|
||||||
return 1;
|
return 1;
|
||||||
}
|
}
|
||||||
// v7 returned false (ptr not in v7 segment) -> fallback to legacy below
|
|
||||||
|
case SMALL_ROUTE_V7: {
|
||||||
|
// Phase v7: SmallObject v7 free (research box)
|
||||||
|
if (small_heap_free_fast_v7_stub(ptr, (uint8_t)class_idx)) {
|
||||||
|
FREE_PATH_STAT_INC(smallheap_v7_fast);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
// V7 miss → fallback to LEGACY
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
case SMALL_ROUTE_MID_V3: {
|
||||||
|
// Phase MID-V3: delegate to MID v3.5
|
||||||
|
small_mid_v35_free(ptr, class_idx);
|
||||||
|
FREE_PATH_STAT_INC(smallheap_v7_fast);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
case SMALL_ROUTE_LEGACY:
|
||||||
|
default:
|
||||||
|
break;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// LEGACY fallback path
|
||||||
tiny_route_kind_t route = tiny_route_for_class((uint8_t)class_idx);
|
tiny_route_kind_t route = tiny_route_for_class((uint8_t)class_idx);
|
||||||
|
|
||||||
// Phase v7-2: v7 early-exit for C6 (legacy path, now handled by Policy Box above)
|
|
||||||
// Kept for compatibility with old routing system
|
|
||||||
if (class_idx == 6 && route == TINY_ROUTE_SMALL_HEAP_V7) {
|
|
||||||
if (small_heap_free_fast_v7_stub(ptr, (uint8_t)class_idx)) {
|
|
||||||
FREE_PATH_STAT_INC(smallheap_v7_fast);
|
|
||||||
return 1;
|
|
||||||
}
|
|
||||||
// v7 returned false (ptr not in v7 segment) -> fallback to legacy below
|
|
||||||
}
|
|
||||||
|
|
||||||
if ((class_idx == 7 || class_idx == 6) &&
|
|
||||||
route == TINY_ROUTE_SMALL_HEAP_V4 &&
|
|
||||||
tiny_ptr_fast_classify_v4_enabled() &&
|
|
||||||
smallobject_hotbox_v4_can_own(class_idx, base)) {
|
|
||||||
small_heap_free_fast_v4(small_heap_ctx_v4_get(), class_idx, base);
|
|
||||||
return 1;
|
|
||||||
}
|
|
||||||
|
|
||||||
const int use_tiny_heap = tiny_route_is_heap_kind(route);
|
const int use_tiny_heap = tiny_route_is_heap_kind(route);
|
||||||
const TinyFrontV3Snapshot* front_snap =
|
const TinyFrontV3Snapshot* front_snap =
|
||||||
__builtin_expect(tiny_front_v3_enabled(), 0) ? tiny_front_v3_snapshot_get() : NULL;
|
__builtin_expect(tiny_front_v3_enabled(), 0) ? tiny_front_v3_snapshot_get() : NULL;
|
||||||
|
|||||||
@ -146,20 +146,27 @@ void small_policy_v7_init_from_env(SmallPolicyV7* policy) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Priority 1: ULTRA (highest priority, C4-C7)
|
// Priority 1: ULTRA (highest priority, C4-C7)
|
||||||
// ENV: HAKMEM_TINY_C7_ULTRA_ENABLED (C7), HAKMEM_TINY_C6_ULTRA_FREE_ENABLED (C6), etc.
|
// Phase v11a-5: All ULTRA ENVs consolidated here (removed from hot path)
|
||||||
// Note: For now, we check individual ULTRA ENV vars
|
|
||||||
|
|
||||||
// C7 ULTRA (default ON)
|
// C7 ULTRA (default ON via HAKMEM_TINY_C7_ULTRA_ENABLED)
|
||||||
if (env_enabled("HAKMEM_TINY_C7_ULTRA_ENABLED")) {
|
if (env_enabled("HAKMEM_TINY_C7_ULTRA_ENABLED")) {
|
||||||
policy->route_kind[7] = SMALL_ROUTE_ULTRA;
|
policy->route_kind[7] = SMALL_ROUTE_ULTRA;
|
||||||
}
|
}
|
||||||
|
|
||||||
// C6 ULTRA (if C6 ULTRA free is enabled, route to ULTRA)
|
// C6 ULTRA (via HAKMEM_TINY_C6_ULTRA_FREE_ENABLED - TLS freelist pop)
|
||||||
// Note: This is a free-only optimization, not full ULTRA for C6 yet
|
if (env_enabled("HAKMEM_TINY_C6_ULTRA_FREE_ENABLED")) {
|
||||||
// Keep C6 routing as-is (v7 or MID_v3) for now
|
policy->route_kind[6] = SMALL_ROUTE_ULTRA;
|
||||||
|
}
|
||||||
|
|
||||||
// C4-C5 ULTRA (if enabled via ENV)
|
// C5 ULTRA (via HAKMEM_TINY_C5_ULTRA_FREE_ENABLED - TLS freelist pop)
|
||||||
// TODO: Add HAKMEM_TINY_C4_ULTRA_ENABLED / C5 when implemented
|
if (env_enabled("HAKMEM_TINY_C5_ULTRA_FREE_ENABLED")) {
|
||||||
|
policy->route_kind[5] = SMALL_ROUTE_ULTRA;
|
||||||
|
}
|
||||||
|
|
||||||
|
// C4 ULTRA (via HAKMEM_TINY_C4_ULTRA_FREE_ENABLED - TLS freelist pop)
|
||||||
|
if (env_enabled("HAKMEM_TINY_C4_ULTRA_FREE_ENABLED")) {
|
||||||
|
policy->route_kind[4] = SMALL_ROUTE_ULTRA;
|
||||||
|
}
|
||||||
|
|
||||||
// Debug output (if needed)
|
// Debug output (if needed)
|
||||||
static int g_debug_once = 0;
|
static int g_debug_once = 0;
|
||||||
|
|||||||
@ -40,7 +40,11 @@ enum {
|
|||||||
// TLS SLL anomalies (investigation aid, gated by HAKMEM_TINY_SLL_RING)
|
// TLS SLL anomalies (investigation aid, gated by HAKMEM_TINY_SLL_RING)
|
||||||
TINY_RING_EVENT_TLS_SLL_REJECT = 0x7F10,
|
TINY_RING_EVENT_TLS_SLL_REJECT = 0x7F10,
|
||||||
TINY_RING_EVENT_TLS_SLL_SENTINEL = 0x7F11,
|
TINY_RING_EVENT_TLS_SLL_SENTINEL = 0x7F11,
|
||||||
TINY_RING_EVENT_TLS_SLL_HDR_CORRUPT = 0x7F12
|
TINY_RING_EVENT_TLS_SLL_HDR_CORRUPT = 0x7F12,
|
||||||
|
// C6 Intrusive Freelist (Phase TLS-UNIFY-3)
|
||||||
|
TINY_RING_EVENT_C6_IFL_PUSH = 0x7F20,
|
||||||
|
TINY_RING_EVENT_C6_IFL_POP = 0x7F21,
|
||||||
|
TINY_RING_EVENT_C6_IFL_EMPTY = 0x7F22 // pop miss (empty)
|
||||||
};
|
};
|
||||||
|
|
||||||
// Function declarations (implementation in tiny_debug_ring.c)
|
// Function declarations (implementation in tiny_debug_ring.c)
|
||||||
|
|||||||
@ -39,6 +39,7 @@ HAKMEM_BENCH_MAX_SIZE=1024
|
|||||||
- `HAKMEM_TINY_FRONT_V3_LUT_ENABLED=1`
|
- `HAKMEM_TINY_FRONT_V3_LUT_ENABLED=1`
|
||||||
- `HAKMEM_MID_V3_ENABLED=1`(Phase MID-V3: 257-768B, C6 only)
|
- `HAKMEM_MID_V3_ENABLED=1`(Phase MID-V3: 257-768B, C6 only)
|
||||||
- `HAKMEM_MID_V3_CLASSES=0x40`(C6 only, C7 は ULTRA に任せる)
|
- `HAKMEM_MID_V3_CLASSES=0x40`(C6 only, C7 は ULTRA に任せる)
|
||||||
|
- `HAKMEM_MID_V35_ENABLED=0`(Phase v11a-5: Mixed では MID v3.5 OFF が最速)
|
||||||
|
|
||||||
### 任意オプション
|
### 任意オプション
|
||||||
- stats を見たいとき:
|
- stats を見たいとき:
|
||||||
@ -85,6 +86,8 @@ HAKMEM_POOL_V2_ENABLED=0
|
|||||||
HAKMEM_POOL_V1_FLATTEN_ENABLED=0 # flatten は初回 OFF
|
HAKMEM_POOL_V1_FLATTEN_ENABLED=0 # flatten は初回 OFF
|
||||||
HAKMEM_MID_V3_ENABLED=1 # Phase MID-V3: 257-768B, C6 only
|
HAKMEM_MID_V3_ENABLED=1 # Phase MID-V3: 257-768B, C6 only
|
||||||
HAKMEM_MID_V3_CLASSES=0x40 # C6 only (+11% on C6-heavy)
|
HAKMEM_MID_V3_CLASSES=0x40 # C6 only (+11% on C6-heavy)
|
||||||
|
HAKMEM_MID_V35_ENABLED=1 # Phase v11a-5: C6-heavy で +8% 改善
|
||||||
|
HAKMEM_MID_V35_CLASSES=0x40 # C6 only (53.1M ops/s)
|
||||||
```
|
```
|
||||||
- mid_desc_lookup TLS キャッシュを試すときだけ: `HAKMEM_MID_DESC_CACHE_ENABLED=1` を上乗せ(デフォルトは OFF)。
|
- mid_desc_lookup TLS キャッシュを試すときだけ: `HAKMEM_MID_DESC_CACHE_ENABLED=1` を上乗せ(デフォルトは OFF)。
|
||||||
|
|
||||||
|
|||||||
46
docs/analysis/MIMALLOC_GAP_SUMMARY_V11A5.md
Normal file
46
docs/analysis/MIMALLOC_GAP_SUMMARY_V11A5.md
Normal file
@ -0,0 +1,46 @@
|
|||||||
|
# mimalloc Gap Summary (Phase v11b-1)
|
||||||
|
|
||||||
|
## Current Status: 2025-12-12
|
||||||
|
|
||||||
|
### Throughput Comparison (Mixed 16-1024B, ws=400, 10M iter)
|
||||||
|
|
||||||
|
| Allocator | Throughput | vs mimalloc |
|
||||||
|
|-----------|------------|-------------|
|
||||||
|
| mimalloc | 65.5M ops/s | 1.00x |
|
||||||
|
| hakmem v11b-1 | 50.7M ops/s | **0.77x** |
|
||||||
|
|
||||||
|
### Progress Summary
|
||||||
|
|
||||||
|
| Phase | Throughput | vs mimalloc | Key Change |
|
||||||
|
|-------|------------|-------------|------------|
|
||||||
|
| v11a-4 | 38.6M | 0.59x | baseline |
|
||||||
|
| v11a-5 | 45.4M | 0.69x | alloc path: single switch + C7 early-exit |
|
||||||
|
| v11b-1 | 50.7M | **0.77x** | free path: single switch + C7 early-exit |
|
||||||
|
|
||||||
|
### perf stat Comparison (Mixed 16-1024B, v11a-5 data)
|
||||||
|
|
||||||
|
| Metric | mimalloc | hakmem | Ratio |
|
||||||
|
|--------|----------|--------|-------|
|
||||||
|
| cycles | ~500M | 1.04B | 2.1x |
|
||||||
|
| instructions | ~920M | 2.2B | 2.4x |
|
||||||
|
| cache-misses | ~90K | 408K | 4.5x |
|
||||||
|
| branch-misses | ~6.3M | 14.5M | 2.3x |
|
||||||
|
|
||||||
|
### Next Target
|
||||||
|
|
||||||
|
**フロント alloc/free 両方を最適化完了。次は backend core または cache locality 改善。**
|
||||||
|
|
||||||
|
Candidates:
|
||||||
|
1. **cache locality**: cache-misses 4.5x が最大差 → TLS page prefetch, hot page reuse
|
||||||
|
2. **instructions削減**: 2.4x → inline 化, マクロ展開
|
||||||
|
3. small-object v7 の small帯 (C2-C3) 設計
|
||||||
|
|
||||||
|
### Key Insight
|
||||||
|
|
||||||
|
- alloc + free 両パスで switch (jump table) 化が有効
|
||||||
|
- フロント層の最適化だけで v11a-4 → v11b-1 で +31% 改善 (38.6M → 50.7M)
|
||||||
|
- mimalloc との差は主に cache-misses (4.5x) と instructions (2.4x)
|
||||||
|
|
||||||
|
---
|
||||||
|
**Date**: 2025-12-12
|
||||||
|
**Phase**: v11b-1 complete
|
||||||
@ -435,4 +435,30 @@ v3 backend の so_alloc_fast/so_free_fast パスの「内部最適化」に進
|
|||||||
|
|
||||||
**推奨**: Phase SO-BACKEND-OPT-2 は実装前に perf profile (cycles:u) で so_alloc_fast/so_free_fast を詳細計測することを推奨。
|
**推奨**: Phase SO-BACKEND-OPT-2 は実装前に perf profile (cycles:u) で so_alloc_fast/so_free_fast を詳細計測することを推奨。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase v11b-1: Free Path Micro-Optimization (2025-12-12)
|
||||||
|
|
||||||
|
### 変更内容
|
||||||
|
|
||||||
|
perf profile で `free_tiny_fast()` のシリアル ULTRA チェック (C7→C6→C5→C4) が 11.73% overhead を占めていることを発見。`malloc_tiny_fast()` と同様のパターンを適用:
|
||||||
|
|
||||||
|
1. **C7 ULTRA early-exit**: Policy snapshot 前に C7 判定(最頻出パスを最短化)
|
||||||
|
2. **Single switch**: route_kind[class_idx] で一発分岐(jump table 生成)
|
||||||
|
3. **Dead code 削除**: 未使用の v4 チェック、重複 v7 チェックを除去
|
||||||
|
|
||||||
|
### 結果
|
||||||
|
|
||||||
|
| Workload | v11a-5 | v11b-1 | 改善 |
|
||||||
|
|----------|--------|--------|------|
|
||||||
|
| Mixed 16-1024B | 45.4M ops/s | 50.7M ops/s | **+11.7%** |
|
||||||
|
| C6-heavy | 49.1M ops/s | 52.0M ops/s | **+5.9%** |
|
||||||
|
| C6-heavy + MID v3.5 | 53.1M ops/s | 53.6M ops/s | +0.9% |
|
||||||
|
|
||||||
|
### 教訓
|
||||||
|
|
||||||
|
- alloc パス最適化 (v11a-5) と同じパターンが free パスにも有効
|
||||||
|
- シリアル if-else チェーン → switch (jump table) で大幅改善
|
||||||
|
- フロント層の分岐コストは backend より大きい(今回 +11.7% vs 想定 +1-2%)
|
||||||
|
|
||||||
***
|
***
|
||||||
|
|||||||
177
docs/analysis/TLS_LAYOUT_V11B1_CURRENT.md
Normal file
177
docs/analysis/TLS_LAYOUT_V11B1_CURRENT.md
Normal file
@ -0,0 +1,177 @@
|
|||||||
|
# TLS Layout Analysis (Phase v11b-1 Current State)
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
現在の ULTRA TLS は **クラス別に独立した struct** が散在しており、L1D キャッシュ効率が悪い。
|
||||||
|
|
||||||
|
## TLS Structures (ULTRA C4-C7)
|
||||||
|
|
||||||
|
### 1. TinyC7UltraFreeTLS (`tiny_c7_ultra_tls_t`)
|
||||||
|
|
||||||
|
```c
|
||||||
|
typedef struct tiny_c7_ultra_tls_t {
|
||||||
|
uint16_t count; // 2B (hot)
|
||||||
|
uint16_t _pad; // 2B
|
||||||
|
void* freelist[128]; // 1024B (128 * 8)
|
||||||
|
// --- cold fields ---
|
||||||
|
uintptr_t seg_base; // 8B
|
||||||
|
uintptr_t seg_end; // 8B
|
||||||
|
tiny_c7_ultra_segment_t* seg; // 8B
|
||||||
|
void* page_base; // 8B
|
||||||
|
size_t block_size; // 8B
|
||||||
|
uint32_t page_idx; // 4B
|
||||||
|
tiny_c7_ultra_page_meta_t* page_meta; // 8B
|
||||||
|
bool headers_initialized; // 1B
|
||||||
|
} tiny_c7_ultra_tls_t;
|
||||||
|
```
|
||||||
|
|
||||||
|
| Field | Size | Total |
|
||||||
|
|-------|------|-------|
|
||||||
|
| Hot (count + freelist) | 4 + 1024 | 1028B |
|
||||||
|
| Cold (seg_base...headers_initialized) | ~53B | ~53B |
|
||||||
|
| **Total** | | **~1080B (17 cache lines)** |
|
||||||
|
|
||||||
|
### 2. TinyC6UltraFreeTLS
|
||||||
|
|
||||||
|
```c
|
||||||
|
typedef struct TinyC6UltraFreeTLS {
|
||||||
|
void* freelist[128]; // 1024B (128 * 8)
|
||||||
|
uint8_t count; // 1B
|
||||||
|
uint8_t _pad[7]; // 7B
|
||||||
|
uintptr_t seg_base; // 8B
|
||||||
|
uintptr_t seg_end; // 8B
|
||||||
|
} TinyC6UltraFreeTLS;
|
||||||
|
```
|
||||||
|
|
||||||
|
| Field | Size |
|
||||||
|
|-------|------|
|
||||||
|
| freelist | 1024B |
|
||||||
|
| count + pad | 8B |
|
||||||
|
| seg_base/end | 16B |
|
||||||
|
| **Total** | **1048B (17 cache lines)** |
|
||||||
|
|
||||||
|
### 3. TinyC5UltraFreeTLS
|
||||||
|
|
||||||
|
Same as C6: **1048B (17 cache lines)**
|
||||||
|
|
||||||
|
### 4. TinyC4UltraFreeTLS
|
||||||
|
|
||||||
|
```c
|
||||||
|
typedef struct TinyC4UltraFreeTLS {
|
||||||
|
void* freelist[64]; // 512B (64 * 8)
|
||||||
|
uint8_t count; // 1B
|
||||||
|
uint8_t _pad[7]; // 7B
|
||||||
|
uintptr_t seg_base; // 8B
|
||||||
|
uintptr_t seg_end; // 8B
|
||||||
|
} TinyC4UltraFreeTLS;
|
||||||
|
```
|
||||||
|
|
||||||
|
| Field | Size |
|
||||||
|
|-------|------|
|
||||||
|
| freelist | 512B |
|
||||||
|
| count + pad | 8B |
|
||||||
|
| seg_base/end | 16B |
|
||||||
|
| **Total** | **536B (9 cache lines)** |
|
||||||
|
|
||||||
|
### 5. SmallMidV35TlsCtx (MID v3.5)
|
||||||
|
|
||||||
|
```c
|
||||||
|
typedef struct {
|
||||||
|
void *page[8]; // 64B
|
||||||
|
uint32_t offset[8]; // 32B
|
||||||
|
uint32_t capacity[8]; // 32B
|
||||||
|
SmallPageMeta_MID_v3 *meta[8]; // 64B
|
||||||
|
} SmallMidV35TlsCtx;
|
||||||
|
```
|
||||||
|
|
||||||
|
| Field | Size |
|
||||||
|
|-------|------|
|
||||||
|
| page[8] | 64B |
|
||||||
|
| offset[8] | 32B |
|
||||||
|
| capacity[8] | 32B |
|
||||||
|
| meta[8] | 64B |
|
||||||
|
| **Total** | **192B (3 cache lines)** |
|
||||||
|
|
||||||
|
## Summary: Total TLS Footprint
|
||||||
|
|
||||||
|
| Structure | Size | Cache Lines |
|
||||||
|
|-----------|------|-------------|
|
||||||
|
| TinyC7UltraFreeTLS | 1080B | 17 |
|
||||||
|
| TinyC6UltraFreeTLS | 1048B | 17 |
|
||||||
|
| TinyC5UltraFreeTLS | 1048B | 17 |
|
||||||
|
| TinyC4UltraFreeTLS | 536B | 9 |
|
||||||
|
| SmallMidV35TlsCtx | 192B | 3 |
|
||||||
|
| **Total ULTRA (C4-C7)** | **3712B** | **~60 lines** |
|
||||||
|
|
||||||
|
## Problem Analysis
|
||||||
|
|
||||||
|
### 1. Hot Path に必要な最小フィールド
|
||||||
|
|
||||||
|
| Operation | Required Fields |
|
||||||
|
|-----------|-----------------|
|
||||||
|
| alloc (TLS hit) | count, freelist[count-1] |
|
||||||
|
| free (TLS push) | count, freelist[count], seg_base/end |
|
||||||
|
|
||||||
|
**Hot path は実質 count + head + seg_range の ~24B で済む。**
|
||||||
|
|
||||||
|
### 2. 現状の問題
|
||||||
|
|
||||||
|
1. **freelist 配列が巨大**: 各クラス 512-1024B の配列を TLS に保持
|
||||||
|
2. **クラス間で seg_base/end が重複**: C4-C7 が同一セグメント範囲なら共有可能
|
||||||
|
3. **count の配置が非統一**: C7 は先頭、C4-C6 は freelist の後ろ
|
||||||
|
4. **Cold fields が hot 領域に混在**: C7 の page_meta 等が毎回ロード
|
||||||
|
|
||||||
|
### 3. Cache Miss の原因
|
||||||
|
|
||||||
|
- alloc/free のたびに **各クラス専用の TLS struct** をアクセス
|
||||||
|
- 4 クラス × 平均 16 cache lines = **64 cache lines が L1D を争奪**
|
||||||
|
- Mixed workload では C4-C7 がランダムに切り替わり、thrashing 発生
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase TLS-UNIFY-2a: C4-C6 Unified TLS (2025-12-12)
|
||||||
|
|
||||||
|
### 実装内容
|
||||||
|
|
||||||
|
C4-C6 ULTRA の TLS を `TinyUltraTlsCtx` 1箱に統合:
|
||||||
|
|
||||||
|
```c
|
||||||
|
typedef struct TinyUltraTlsCtx {
|
||||||
|
// Hot line: counts (8B aligned)
|
||||||
|
uint16_t c4_count;
|
||||||
|
uint16_t c5_count;
|
||||||
|
uint16_t c6_count;
|
||||||
|
uint16_t _pad_count;
|
||||||
|
|
||||||
|
// Per-class segment ranges (learned on first free)
|
||||||
|
uintptr_t c4_seg_base, c4_seg_end;
|
||||||
|
uintptr_t c5_seg_base, c5_seg_end;
|
||||||
|
uintptr_t c6_seg_base, c6_seg_end;
|
||||||
|
|
||||||
|
// Per-class array magazines
|
||||||
|
void* c4_freelist[64]; // 512B
|
||||||
|
void* c5_freelist[64]; // 512B
|
||||||
|
void* c6_freelist[128]; // 1024B
|
||||||
|
} TinyUltraTlsCtx;
|
||||||
|
// Total: ~2KB per thread
|
||||||
|
```
|
||||||
|
|
||||||
|
**変更点**:
|
||||||
|
- C4/C5/C6 の TLS を 1 struct に統合
|
||||||
|
- 配列マガジン方式を維持(安全)
|
||||||
|
- C7 は別箱のまま(既に安定)
|
||||||
|
- 旧 `TinyC4/5/6UltraFreeTLS` への委譲を廃止
|
||||||
|
|
||||||
|
### A/B テスト結果
|
||||||
|
|
||||||
|
| Test | v11b-1 (Phase 1) | TLS-UNIFY-2a | Diff |
|
||||||
|
|------|------------------|--------------|------|
|
||||||
|
| Mixed 16-1024B | 8.0-8.8 Mop/s | 8.5-9.0 Mop/s | +0~5% |
|
||||||
|
| MID 257-768B | 8.5-9.0 Mop/s | 8.1-9.0 Mop/s | ±0% |
|
||||||
|
|
||||||
|
**結果**: 性能同等以上、SEGV/assert なし ✅
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Date**: 2025-12-12
|
||||||
|
**Phase**: TLS-UNIFY-2a completed
|
||||||
158
docs/analysis/TLS_LAYOUT_V11B1_PLAN.md
Normal file
158
docs/analysis/TLS_LAYOUT_V11B1_PLAN.md
Normal file
@ -0,0 +1,158 @@
|
|||||||
|
# TLS Layout Plan: Unified ULTRA TLS (Phase v11b-2 Target)
|
||||||
|
|
||||||
|
## Goal
|
||||||
|
|
||||||
|
C4-C7 ULTRA の hot path を **1 cache line (64B)** に収める。
|
||||||
|
|
||||||
|
## Design: TinyUltraTlsCtx
|
||||||
|
|
||||||
|
```c
|
||||||
|
// ============================================================================
|
||||||
|
// LINE 1: Hot fields (64B) - alloc/free hot path
|
||||||
|
// ============================================================================
|
||||||
|
typedef struct TinyUltraTlsCtx {
|
||||||
|
// Counts (8B total, padded for alignment)
|
||||||
|
uint16_t c4_count; // 2B
|
||||||
|
uint16_t c5_count; // 2B
|
||||||
|
uint16_t c6_count; // 2B
|
||||||
|
uint16_t c7_count; // 2B
|
||||||
|
|
||||||
|
// Freelist heads (32B)
|
||||||
|
void* c4_head; // 8B - next free slot for C4
|
||||||
|
void* c5_head; // 8B
|
||||||
|
void* c6_head; // 8B
|
||||||
|
void* c7_head; // 8B
|
||||||
|
|
||||||
|
// Segment range (shared across C4-C7, 16B)
|
||||||
|
uintptr_t seg_base; // 8B
|
||||||
|
uintptr_t seg_end; // 8B
|
||||||
|
|
||||||
|
// ========== LINE 1 END: 56B used, 8B spare ==========
|
||||||
|
|
||||||
|
uint64_t _hot_pad; // 8B - align to 64B
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// LINE 2+: Cold fields (refill/retire, debug, stats)
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
// Freelist tails (for bulk push, 32B)
|
||||||
|
void* c4_tail; // 8B
|
||||||
|
void* c5_tail; // 8B
|
||||||
|
void* c6_tail; // 8B
|
||||||
|
void* c7_tail; // 8B
|
||||||
|
|
||||||
|
// Segment metadata (16B)
|
||||||
|
void* segment; // 8B - owning segment pointer
|
||||||
|
uint32_t page_idx; // 4B - current page index
|
||||||
|
uint32_t _cold_pad; // 4B
|
||||||
|
|
||||||
|
// Stats (optional, 16B)
|
||||||
|
uint64_t alloc_count; // 8B
|
||||||
|
uint64_t free_count; // 8B
|
||||||
|
|
||||||
|
} TinyUltraTlsCtx;
|
||||||
|
// Total: 128B (2 cache lines)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Memory Layout
|
||||||
|
|
||||||
|
```
|
||||||
|
Offset Field Size Cache Line
|
||||||
|
------ ----- ---- ----------
|
||||||
|
0x00 c4_count 2B LINE 1 (HOT)
|
||||||
|
0x02 c5_count 2B LINE 1
|
||||||
|
0x04 c6_count 2B LINE 1
|
||||||
|
0x06 c7_count 2B LINE 1
|
||||||
|
0x08 c4_head 8B LINE 1
|
||||||
|
0x10 c5_head 8B LINE 1
|
||||||
|
0x18 c6_head 8B LINE 1
|
||||||
|
0x20 c7_head 8B LINE 1
|
||||||
|
0x28 seg_base 8B LINE 1
|
||||||
|
0x30 seg_end 8B LINE 1
|
||||||
|
0x38 _hot_pad 8B LINE 1
|
||||||
|
------ ----- ---- ----------
|
||||||
|
0x40 c4_tail 8B LINE 2 (COLD)
|
||||||
|
0x48 c5_tail 8B LINE 2
|
||||||
|
0x50 c6_tail 8B LINE 2
|
||||||
|
0x58 c7_tail 8B LINE 2
|
||||||
|
0x60 segment 8B LINE 3
|
||||||
|
0x68 page_idx 4B LINE 3
|
||||||
|
0x6C _cold_pad 4B LINE 3
|
||||||
|
0x70 alloc_count 8B LINE 3
|
||||||
|
0x78 free_count 8B LINE 3
|
||||||
|
```
|
||||||
|
|
||||||
|
## Hot Path Access Pattern
|
||||||
|
|
||||||
|
### alloc (TLS hit)
|
||||||
|
|
||||||
|
```c
|
||||||
|
static inline void* tiny_ultra_alloc_fast(TinyUltraTlsCtx* ctx, uint8_t class_idx) {
|
||||||
|
// Single cache line access
|
||||||
|
uint16_t* counts = &ctx->c4_count;
|
||||||
|
void** heads = &ctx->c4_head;
|
||||||
|
|
||||||
|
uint16_t c = counts[class_idx - 4];
|
||||||
|
if (likely(c > 0)) {
|
||||||
|
counts[class_idx - 4] = c - 1;
|
||||||
|
return heads[class_idx - 4]; // pop from linked list
|
||||||
|
}
|
||||||
|
return tiny_ultra_alloc_slow(ctx, class_idx);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### free (TLS push)
|
||||||
|
|
||||||
|
```c
|
||||||
|
static inline void tiny_ultra_free_fast(TinyUltraTlsCtx* ctx, void* ptr, uint8_t class_idx) {
|
||||||
|
// Range check (seg_base/end in same cache line)
|
||||||
|
uintptr_t p = (uintptr_t)ptr;
|
||||||
|
if (likely(p >= ctx->seg_base && p < ctx->seg_end)) {
|
||||||
|
// Push to freelist (single cache line)
|
||||||
|
void** heads = &ctx->c4_head;
|
||||||
|
uint16_t* counts = &ctx->c4_count;
|
||||||
|
|
||||||
|
*(void**)ptr = heads[class_idx - 4];
|
||||||
|
heads[class_idx - 4] = ptr;
|
||||||
|
counts[class_idx - 4]++;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
tiny_ultra_free_slow(ctx, ptr, class_idx);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Comparison: Before vs After
|
||||||
|
|
||||||
|
| Metric | Current (v11b-1) | Unified (v11b-2) |
|
||||||
|
|--------|------------------|------------------|
|
||||||
|
| TLS size (C4-C7) | 3712B | 128B |
|
||||||
|
| Cache lines (hot) | ~60 | **1** |
|
||||||
|
| seg_base/end copies | 4 | 1 |
|
||||||
|
| count access | scattered | contiguous |
|
||||||
|
|
||||||
|
## Freelist Design: Linked List vs Array
|
||||||
|
|
||||||
|
**選択: Linked List (head/tail)**
|
||||||
|
|
||||||
|
理由:
|
||||||
|
1. **固定配列不要**: freelist[128] の 1KB を削除
|
||||||
|
2. **O(1) push/pop**: head だけで十分
|
||||||
|
3. **Bulk drain**: tail があれば一括返却可能
|
||||||
|
4. **メモリ効率**: 使用中スロットにのみリンク
|
||||||
|
|
||||||
|
トレードオフ:
|
||||||
|
- prefetch しにくい(配列なら連続アクセス可能)
|
||||||
|
- 空間局所性が落ちる可能性
|
||||||
|
|
||||||
|
→ プロファイル後に配列版も検討可能
|
||||||
|
|
||||||
|
## Implementation Notes
|
||||||
|
|
||||||
|
1. **Backward Compatibility**: 既存の TinyC*UltraFreeTLS API を維持しつつ、内部で TinyUltraTlsCtx を使う
|
||||||
|
2. **Gradual Migration**: まず C7 を新構造に移行し、効果を計測
|
||||||
|
3. **ENV Gate**: `HAKMEM_ULTRA_UNIFIED_TLS=1` で有効化
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Date**: 2025-12-12
|
||||||
|
**Phase**: v11b-2 planning
|
||||||
268
docs/analysis/ULTRA_C6_INTRUSIVE_FREELIST_DESIGN_V11B.md
Normal file
268
docs/analysis/ULTRA_C6_INTRUSIVE_FREELIST_DESIGN_V11B.md
Normal file
@ -0,0 +1,268 @@
|
|||||||
|
# ULTRA C6 Intrusive Freelist Design (Phase TLS-UNIFY-3)
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
C6 ULTRA の TLS freelist を配列マガジン方式から intrusive LIFO 方式に移行する設計。
|
||||||
|
header は維持したまま、freed ブロックの user 先頭 8B に next ポインタを格納する。
|
||||||
|
|
||||||
|
## 1. ブロックレイアウト
|
||||||
|
|
||||||
|
### 現状 (C6 配列マガジン)
|
||||||
|
|
||||||
|
```
|
||||||
|
base[0]: header (1B, class_idx=6)
|
||||||
|
base[1..]: user data (511B usable)
|
||||||
|
|
||||||
|
TLS: void* c6_freelist[128]; // 1KB の配列
|
||||||
|
uint16_t c6_count;
|
||||||
|
```
|
||||||
|
|
||||||
|
### 提案 (C6 intrusive LIFO)
|
||||||
|
|
||||||
|
```
|
||||||
|
【Allocated block】
|
||||||
|
base[0]: header (1B, class_idx=6) ← 従来通り
|
||||||
|
base[1..]: user data (511B usable) ← 従来通り
|
||||||
|
|
||||||
|
【Freed block in TLS】
|
||||||
|
base[0]: header (1B, class_idx=6) ← 維持
|
||||||
|
base[1..8]: next pointer (8B) ← intrusive link
|
||||||
|
base[9..]: garbage / unused
|
||||||
|
|
||||||
|
TLS: void* c6_head; // LIFO head (BASE pointer)
|
||||||
|
uint16_t c6_count; // optional: for cap enforcement
|
||||||
|
```
|
||||||
|
|
||||||
|
### 設計判断
|
||||||
|
|
||||||
|
| 項目 | 選択 | 理由 |
|
||||||
|
|------|------|------|
|
||||||
|
| header | **維持** | base→user (+1)、分類ロジック、HAK_RET_ALLOC_* と整合 |
|
||||||
|
| next 位置 | user先頭 (base+1) | tiny_nextptr.h の既存パターンと一致 |
|
||||||
|
| alignment | 8B aligned | next ポインタが自然にアクセス可能 |
|
||||||
|
|
||||||
|
### C6 ページ内スロット数
|
||||||
|
|
||||||
|
```
|
||||||
|
Page size: 64 KiB = 65536B
|
||||||
|
Block size: 512B (including 1B header)
|
||||||
|
Slots/page: 65536 / 512 = 128 slots
|
||||||
|
```
|
||||||
|
|
||||||
|
## 2. C6-only ULTRA Lane API
|
||||||
|
|
||||||
|
### C6IntrusiveFreeListBox (L1 箱)
|
||||||
|
|
||||||
|
**重要**: next ポインタは必ず `tiny_next_store/load()` 経由で触る(直接 `*(void**)` 禁止)。
|
||||||
|
|
||||||
|
```c
|
||||||
|
// core/box/c6_intrusive_freelist_box.h
|
||||||
|
|
||||||
|
#include "../tiny_nextptr.h"
|
||||||
|
|
||||||
|
// C6 固定ラッパ (tiny_next_* の唯一の真実に委譲)
|
||||||
|
static inline void* c6_ifl_next_load(void* base) {
|
||||||
|
return tiny_next_load(base, 6); // off=1 (user先頭)
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline void c6_ifl_next_store(void* base, void* next) {
|
||||||
|
tiny_next_store(base, 6, next); // off=1 (user先頭)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Pop from intrusive LIFO
|
||||||
|
static inline void* c6_ifl_pop(void** head, uint16_t* count) {
|
||||||
|
void* base = *head;
|
||||||
|
if (base == NULL) return NULL;
|
||||||
|
*head = c6_ifl_next_load(base);
|
||||||
|
(*count)--;
|
||||||
|
return base;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Push to intrusive LIFO
|
||||||
|
static inline void c6_ifl_push(void** head, uint16_t* count, void* base) {
|
||||||
|
c6_ifl_next_store(base, *head);
|
||||||
|
*head = base;
|
||||||
|
(*count)++;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check if empty
|
||||||
|
static inline bool c6_ifl_empty(void* head) {
|
||||||
|
return head == NULL;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 新 API (v12 intrusive)
|
||||||
|
|
||||||
|
```c
|
||||||
|
// Alloc: pop from intrusive LIFO
|
||||||
|
void* c6_ultra_alloc_intrusive(void) {
|
||||||
|
TinyUltraTlsCtx* ctx = tiny_ultra_tls_ctx();
|
||||||
|
return c6_ifl_pop(&ctx->c6_head, &ctx->c6_count);
|
||||||
|
// Returns NULL if empty → caller fallbacks to slow path
|
||||||
|
}
|
||||||
|
|
||||||
|
// Free: push to intrusive LIFO
|
||||||
|
void c6_ultra_free_intrusive(void* base) {
|
||||||
|
TinyUltraTlsCtx* ctx = tiny_ultra_tls_ctx();
|
||||||
|
uintptr_t addr = (uintptr_t)base;
|
||||||
|
|
||||||
|
// Segment range check + capacity check
|
||||||
|
if (ctx->c6_seg_base != 0 &&
|
||||||
|
addr >= ctx->c6_seg_base &&
|
||||||
|
addr < ctx->c6_seg_end &&
|
||||||
|
ctx->c6_count < TINY_ULTRA_C6_CAP) {
|
||||||
|
c6_ifl_push(&ctx->c6_head, &ctx->c6_count, base);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
// Slow path: fallback to segment free
|
||||||
|
c6_ultra_free_intrusive_slow(base);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 既存 C6 ULTRA との関係
|
||||||
|
|
||||||
|
| Phase | C6 ULTRA 方式 | ENV gate |
|
||||||
|
|-------|---------------|----------|
|
||||||
|
| v11 (現行) | 配列マガジン | `HAKMEM_C6_ULTRA_ENABLED=1` |
|
||||||
|
| v12 (新規) | intrusive LIFO | `HAKMEM_C6_ULTRA_V12=1` |
|
||||||
|
|
||||||
|
- v11 と v12 は排他 (両方ON は未定義)
|
||||||
|
- A/B テスト期間中は ENV で切り替え
|
||||||
|
- 安定後は v12 をデフォルトに昇格
|
||||||
|
|
||||||
|
## 3. 移行プラン
|
||||||
|
|
||||||
|
### Phase 1: v12 lane 実装 (別 ENV)
|
||||||
|
|
||||||
|
- `HAKMEM_C6_ULTRA_V12=1` で有効化
|
||||||
|
- 既存 C6 ULTRA (v11 配列) は `HAKMEM_C6_ULTRA_V12=0` で維持
|
||||||
|
- TinyUltraTlsCtx に `c6_head` フィールド追加 (v12用)
|
||||||
|
- Policy route で v12/v11 を分岐
|
||||||
|
|
||||||
|
### Phase 2: A/B テスト
|
||||||
|
|
||||||
|
- C6-heavy workload で v11 vs v12 を比較
|
||||||
|
- 目標: v12 が v11 と同等以上の性能
|
||||||
|
- メモリ効率: TLS サイズ削減 (1KB配列 → 8B head)
|
||||||
|
|
||||||
|
### Phase 3: v12 昇格
|
||||||
|
|
||||||
|
- v12 が安定したら `HAKMEM_C6_ULTRA_V12=1` をデフォルト化
|
||||||
|
- v11 配列方式は deprecated → 将来削除
|
||||||
|
|
||||||
|
## 4. TLS 統合との整合性
|
||||||
|
|
||||||
|
### TinyUltraTlsCtx の拡張
|
||||||
|
|
||||||
|
```c
|
||||||
|
typedef struct TinyUltraTlsCtx {
|
||||||
|
// Hot line: counts
|
||||||
|
uint16_t c4_count;
|
||||||
|
uint16_t c5_count;
|
||||||
|
uint16_t c6_count;
|
||||||
|
uint16_t _pad_count;
|
||||||
|
|
||||||
|
// Per-class segment ranges
|
||||||
|
uintptr_t c4_seg_base, c4_seg_end;
|
||||||
|
uintptr_t c5_seg_base, c5_seg_end;
|
||||||
|
uintptr_t c6_seg_base, c6_seg_end;
|
||||||
|
|
||||||
|
// C4/C5: array magazine (現状維持)
|
||||||
|
void* c4_freelist[64];
|
||||||
|
void* c5_freelist[64];
|
||||||
|
|
||||||
|
// C6: intrusive LIFO (v12)
|
||||||
|
void* c6_head; // NEW: intrusive head
|
||||||
|
// void* c6_freelist[128]; // REMOVED in v12
|
||||||
|
|
||||||
|
// or: conditional compilation で両方保持
|
||||||
|
#if HAKMEM_C6_ULTRA_V12
|
||||||
|
void* c6_head;
|
||||||
|
#else
|
||||||
|
void* c6_freelist[128];
|
||||||
|
#endif
|
||||||
|
} TinyUltraTlsCtx;
|
||||||
|
```
|
||||||
|
|
||||||
|
### C4/C5 の方針
|
||||||
|
|
||||||
|
- **当面は配列マガジン維持**
|
||||||
|
- C6 intrusive の成功を確認後、C4/C5 も検討
|
||||||
|
- C4 (128B) / C5 (256B) は next ポインタ 8B の相対オーバーヘッドが大きい
|
||||||
|
- C4: 8/128 = 6.25% overhead
|
||||||
|
- C5: 8/256 = 3.125% overhead
|
||||||
|
- C6: 8/512 = 1.56% overhead ← 最も効率的
|
||||||
|
|
||||||
|
## 5. 互換性と将来拡張
|
||||||
|
|
||||||
|
### 今回の intrusive は header を壊さない
|
||||||
|
|
||||||
|
- base[0] = header (1B) は従来通り維持
|
||||||
|
- hak_base_to_user() / hak_user_to_base() は +1/-1 のまま
|
||||||
|
- HAK_RET_ALLOC_* / HAK_BASE_FROM_RAW は変更不要
|
||||||
|
- tiny_header_* マクロは従来通り動作
|
||||||
|
- free 側の C0-C6 分類は header 読み取りで動作
|
||||||
|
|
||||||
|
### 将来: 真の headerless C6 intrusive (TLS-UNIFY-3b)
|
||||||
|
|
||||||
|
真の headerless 化を目指す場合は別フェーズで検討:
|
||||||
|
|
||||||
|
1. **lane-param 化**: `tiny_user_offset` を class ごとに可変化
|
||||||
|
- C0-C5: offset=1 (header あり)
|
||||||
|
- C6 v12b: offset=0 (headerless)
|
||||||
|
|
||||||
|
2. **ptr conversion 分岐**: `hak_base_to_user()` を lane-aware に
|
||||||
|
```c
|
||||||
|
static inline void* hak_base_to_user(void* base, int class_idx) {
|
||||||
|
int offset = (class_idx == 6 && c6_headerless_enabled()) ? 0 : 1;
|
||||||
|
return (uint8_t*)base + offset;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **分類境界の再設計**: headerless ブロックの class 判定
|
||||||
|
- Option A: side metadata (page header に class 記録)
|
||||||
|
- Option B: address range check (segment→class mapping)
|
||||||
|
|
||||||
|
4. **A/B テスト**: headerless C6 vs header-maintained C6
|
||||||
|
|
||||||
|
**結論**: headerless 化は大手術。今回は header維持 + intrusive で安全に進める。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. 実装指示 (Phase TLS-UNIFY-3-IMPL)
|
||||||
|
|
||||||
|
小さく積む順:
|
||||||
|
|
||||||
|
1. **ENV ガード追加**
|
||||||
|
- `HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL=1` (デフォルト OFF)
|
||||||
|
- 既存 C6 ULTRA と併存
|
||||||
|
|
||||||
|
2. **C6IntrusiveFreeListBox 作成**
|
||||||
|
- `core/box/c6_intrusive_freelist_box.h`
|
||||||
|
- static inline で `c6_ifl_push/pop/empty` のみ
|
||||||
|
- 必ず `tiny_next_store/load(base, 6, ...)` 経由 (直書き禁止)
|
||||||
|
|
||||||
|
3. **TinyUltraTlsCtx 拡張**
|
||||||
|
- `void* c6_head;` 追加 (intrusive LIFO head)
|
||||||
|
- `c6_count` は既存を流用可
|
||||||
|
- init/reset/slowpath 境界でのみ操作
|
||||||
|
|
||||||
|
4. **c6_ultra_alloc/free_intrusive() 実装**
|
||||||
|
- ENV_OFF または空 → 既存 magazine/ULTRA へ Fail-Fast フォールバック
|
||||||
|
- 採用境界は `tiny_refill_try_fast()` / `superslab_refill()` に固定
|
||||||
|
- drain→bind→owner は refill でのみ踏む (intrusive 箱は副作用ゼロ)
|
||||||
|
|
||||||
|
5. **可視化**
|
||||||
|
- ring に `C6_IFL_PUSH/POP/REFILL_EMPTY`
|
||||||
|
- 異常系のみワンショット
|
||||||
|
- push/pop/fallback カウンタ
|
||||||
|
|
||||||
|
6. **健康診断**
|
||||||
|
- `scripts/verify_health_profiles.sh` を ENV_OFF/ON で各1回
|
||||||
|
- 差分が出たら L1/L2 境界違反を疑う
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Date**: 2025-12-12
|
||||||
|
**Phase**: TLS-UNIFY-3-DESIGN
|
||||||
|
**Status**: Design document (implementation in next session)
|
||||||
2
hakmem.d
2
hakmem.d
@ -127,6 +127,7 @@ hakmem.o: core/hakmem.c core/hakmem.h core/hakmem_build_flags.h \
|
|||||||
core/box/../front/../box/tiny_c5_ultra_free_env_box.h \
|
core/box/../front/../box/tiny_c5_ultra_free_env_box.h \
|
||||||
core/box/../front/../box/tiny_c4_ultra_free_box.h \
|
core/box/../front/../box/tiny_c4_ultra_free_box.h \
|
||||||
core/box/../front/../box/tiny_c4_ultra_free_env_box.h \
|
core/box/../front/../box/tiny_c4_ultra_free_env_box.h \
|
||||||
|
core/box/../front/../box/tiny_ultra_tls_box.h \
|
||||||
core/box/../front/../box/tiny_ultra_classes_box.h \
|
core/box/../front/../box/tiny_ultra_classes_box.h \
|
||||||
core/box/../front/../box/tiny_legacy_fallback_box.h \
|
core/box/../front/../box/tiny_legacy_fallback_box.h \
|
||||||
core/box/../front/../box/tiny_front_v3_env_box.h \
|
core/box/../front/../box/tiny_front_v3_env_box.h \
|
||||||
@ -343,6 +344,7 @@ core/box/../front/../box/tiny_c5_ultra_free_box.h:
|
|||||||
core/box/../front/../box/tiny_c5_ultra_free_env_box.h:
|
core/box/../front/../box/tiny_c5_ultra_free_env_box.h:
|
||||||
core/box/../front/../box/tiny_c4_ultra_free_box.h:
|
core/box/../front/../box/tiny_c4_ultra_free_box.h:
|
||||||
core/box/../front/../box/tiny_c4_ultra_free_env_box.h:
|
core/box/../front/../box/tiny_c4_ultra_free_env_box.h:
|
||||||
|
core/box/../front/../box/tiny_ultra_tls_box.h:
|
||||||
core/box/../front/../box/tiny_ultra_classes_box.h:
|
core/box/../front/../box/tiny_ultra_classes_box.h:
|
||||||
core/box/../front/../box/tiny_legacy_fallback_box.h:
|
core/box/../front/../box/tiny_legacy_fallback_box.h:
|
||||||
core/box/../front/../box/tiny_front_v3_env_box.h:
|
core/box/../front/../box/tiny_front_v3_env_box.h:
|
||||||
|
|||||||
Reference in New Issue
Block a user