diff --git a/CURRENT_TASK.md b/CURRENT_TASK.md index b149cac4..357dfa94 100644 --- a/CURRENT_TASK.md +++ b/CURRENT_TASK.md @@ -244,9 +244,91 @@ ENV変数でTiny/Mid境界を動的調整可能にする機能を追加: - **ギャップ解消**: Tiny (6M) と Mid (?) の間を埋める - **Box 理論的健全性**: 境界明確、一方向依存、A/B 可能 -### 5.6 実装状況 +### 5.6 Phase 17-1 実装結果(2025-11-16完了) -🚧 **IN PROGRESS** - 設計方針確定、実装開始準備中 +**戦略**: TLS Frontend Cache Only(Tiny Backend 委譲) +- サイズクラス: 5 → 3 に削減(256B/512B/1KB のみ) +- Backend: Tiny C5/C6/C7 に委譲、Header 変換(0xa0 → 0xb0) +- TLS 容量: 控えめ(32/24/16 blocks) + +**実装ファイル**: +- `core/hakmem_smallmid.h/c`: TLS freelist + backend delegation +- `core/hakmem_tiny.c`: `tiny_get_max_size()` 自動調整(Small-Mid ON 時に C0-C5 に制限) +- `core/box/hak_alloc_api.inc.h`: Small-Mid を Tiny より前に配置(routing 順序) + +### 5.7 A/B Benchmark Results(Phase 17-1) + +| Size | Config A (OFF) | Config B (ON) | 変化率 | 目標達成 | +|------|----------------|---------------|--------|----------| +| **256B** | 5.87M ops/s | 6.06M ops/s | **+3.3%** | ❌ | +| **512B** | 6.02M ops/s | 5.91M ops/s | **-1.9%** | ❌ | +| **1024B** | 5.58M ops/s | 5.54M ops/s | **-0.6%** | ❌ | +| **総合** | 5.82M ops/s | 5.84M ops/s | **+0.3%** | ❌ | + +### 5.8 Phase 17-1 の成果と学び + +✅ **成功点**: +1. **層の分離達成** - Small-Mid と Tiny が cleanly 共存 +2. **オーバーヘッド最小** - ±0.3% = 測定誤差内(clean な実装) +3. **Routing 順序修正** - Small-Mid → Tiny の順で正しく動作 +4. **Auto-adjust 機能** - Small-Mid ON 時に Tiny が自動的に C0-C5 に制限 +5. **基盤完成** - これから最適化で改善のみ! + +❌ **失敗点**: +- **性能改善なし** (+0.3% は目標の 2-4x に遠く及ばず) + +**根本原因分析**: +1. **Delegation オーバーヘッド = TLS 節約分** + - Small-Mid TLS alloc: ~3-5 命令 + - Tiny backend delegation: ~3-5 命令 + - Header 変換 (0xa0 → 0xb0): ~2 命令 + - **正味利益: ~0命令** (オーバーヘッドが利益を相殺) + +2. **Backend が1ブロックずつ呼ばれる** + - Small-Mid は 1:1 で Tiny に delegate (batching なし) + - `hak_tiny_alloc()` / `hak_tiny_free()` 呼び出し削減なし + - 期待: Batch refills → 実際: Pass-through + +**教訓**: +- **Frontend-only アプローチは効果なし** - Backend delegation コストが大きすぎる +- **次は専用 Backend が必須** - Tiny から独立した Small-Mid SuperSlab pool 必要 + +### 5.9 次のステップ: Phase 17-2(専用 Backend) + +**戦略**: Small-Mid 専用 SuperSlab Backend(Tiny から完全分離) + +**設計**: +1. **専用 SuperSlab pool** (Tiny と分離) + - Tiny delegation なし + - Header 変換オーバーヘッドなし + - 直接 0xb0 header 書き込み + +2. **TLS refill batching** + - 1回のrefillで 8-16 blocks 取得 + - SuperSlab lookup コストを償却 + - 目標: 50-70% frontend hit rate + +3. **最適化 free path** + - 直接 0xb0 header 読み取り → Small-Mid TLS push + - Cached blocks に backend round-trip なし + +**期待性能**: +- **Frontend hits**: 1-2 命令 (TLS pop/push) +- **Backend misses**: 5-8 命令 (batch refill) +- **加重平均** (60% hit): 0.6×2 + 0.4×6 = **~4命令** +- **現在の Tiny path**: 8-12 命令 +- **期待利益**: 50-67% 削減 → **2-3x throughput** ✅ + +**目標メトリクス**: +- 256B: 5.87M → 12-15M ops/s (2.0-2.6x) +- 512B: 6.02M → 12-15M ops/s (2.0-2.5x) +- 1024B: 5.58M → 11-14M ops/s (2.0-2.5x) + +**実装優先順位**: +1. Phase 17-2.1: Dedicated SuperSlab backend (Tiny から分離) +2. Phase 17-2.2: TLS batch refill (8-16 blocks) +3. Phase 17-2.3: Optimized 0xb0 header fast path +4. Phase 17-2.4: Benchmark validation (目標: 12-18M ops/s) --- @@ -291,43 +373,64 @@ ENV変数でTiny/Mid境界を動的調整可能にする機能を追加: --- -## 7. Claude Code 君向け TODO(Phase 17 実装リスト) +## 7. Claude Code 君向け TODO(Phase 17-2 実装リスト) -### 7.1 Phase 17: Small-Mid Box 実装 🚧 +### 7.1 Phase 17-1: TLS Frontend Cache ✅ 完了(2025-11-16) -1. **ヘッダー作成** (`core/hakmem_smallmid.h`) - - 5 size classes 定義 (256B/512B/1KB/2KB/4KB) +1. ✅ **ヘッダー作成** (`core/hakmem_smallmid.h`) + - 3 size classes 定義 (256B/512B/1KB) - TLS freelist 構造体定義 - size → class マッピング関数 - - Fast alloc/free API 宣言 -2. **専用 SuperSlab バックエンド** (`core/hakmem_smallmid_superslab.c`) - - Small-Mid 専用 SuperSlab プール(Tiny と完全分離) - - スパン予約・解放ロジック - - refill 関数 +2. ✅ **Backend delegation 実装** (`core/hakmem_smallmid.c`) + - Tiny C5/C6/C7 に委譲 + - Header 変換(0xa0 → 0xb0) + - TLS SLL pop/push -3. **Fast alloc/free 実装** (`core/smallmid_alloc_fast.inc.h`, `core/hakmem_smallmid.c`) - - Header-based fast free (Phase 7 技術流用) - - TLS SLL pop/push (Tiny と同じ構造) - - Bump allocation fallback +3. ✅ **Auto-adjust 機能** (`core/hakmem_tiny.c`) + - Small-Mid ON 時に Tiny を C0-C5 に自動制限 + - `tiny_get_max_size()` 動的調整 + +4. ✅ **ルーティング統合** (`hak_alloc_api.inc.h`) + - Small-Mid を Tiny より前に配置 - ENV 制御: `HAKMEM_SMALLMID_ENABLE=1` -4. **ルーティング統合** (`hak_alloc_api.inc.h`) - - Small-Mid 層の追加(256B-4KB) - - ENV で ON/OFF 切り替え +5. ✅ **A/B ベンチマーク** + - Config A/B 実施(3 runs each) + - 結果: ±0.3% (性能改善なし) + - 教訓: Frontend-only は効果なし、専用 Backend 必須 -5. **A/B ベンチマーク** - - Config A: Small-Mid OFF(現状) - - Config B: Small-Mid ON(新実装) - - 256B/512B/1KB/2KB/4KB で性能測定 +### 7.2 Phase 17-2: Dedicated Backend 🚧 次のタスク -6. **ドキュメント作成** - - `PHASE17_SMALLMID_BOX_DESIGN.md` - 設計書 - - `PHASE17_SMALLMID_AB_RESULTS.md` - A/B テスト結果 +**目標**: Small-Mid 専用 SuperSlab backend で 2-3x 性能改善 -### 7.2 その他タスク(Phase 17 後) +1. **専用 SuperSlab backend** (`core/hakmem_smallmid_superslab.c`) + - Small-Mid 専用 SuperSlab プール(Tiny と完全分離) + - Slab metadata 構造定義 + - スパン予約・解放ロジック -1. **Phase 16 結果の詳細分析** +2. **TLS batch refill** (`core/smallmid_refill_box.c`) + - 1回のrefillで 8-16 blocks 取得 + - SuperSlab lookup コストを償却 + - Refill 失敗時の fallback 処理 + +3. **Optimized alloc/free path** (`core/hakmem_smallmid.c`) + - 直接 0xb0 header 書き込み(Tiny delegation なし) + - TLS hit: 1-2 命令 + - TLS miss: batch refill (5-8 命令) + +4. **A/B ベンチマーク** + - Config A: Phase 17-2 OFF(現状 5.82M ops/s) + - Config B: Phase 17-2 ON(目標 12-15M ops/s) + - 256B/512B/1KB で性能測定 + +5. **ドキュメント作成** + - `PHASE17_2_SMALLMID_BACKEND_DESIGN.md` - 設計書 + - `PHASE17_2_AB_RESULTS.md` - A/B テスト結果 + +### 7.3 その他タスク(Phase 17-2 後) + +1. **Phase 16/17-1 結果の詳細分析** - ✅ 完了 - CURRENT_TASK.md に記録済み 2. **C2/C3 UltraHot のコード掃除** @@ -342,11 +445,17 @@ ENV変数でTiny/Mid境界を動的調整可能にする機能を追加: --- -## 8. Phase 17 実装ログ(進行中) +## 8. Phase 17 実装ログ -### 2025-11-16 +### 2025-11-16(Phase 17-1 完了) - ✅ Phase 16 完了・A/B テスト結果分析 - ✅ ChatGPT 先生の Small-Mid Box 提案レビュー -- ✅ CURRENT_TASK.md 更新(Phase 17 設計方針確定) -- 🚧 次: `core/hakmem_smallmid.h` ヘッダー作成開始 +- ✅ Phase 17-1 実装完了(TLS Frontend + Tiny Backend delegation) + - `core/hakmem_smallmid.h/c`: TLS freelist + backend delegation + - `core/hakmem_tiny.c`: Auto-adjust 機能 + - `core/box/hak_alloc_api.inc.h`: Routing 順序修正 +- ✅ A/B ベンチマーク完了(結果: ±0.3%, 性能改善なし) +- ✅ 根本原因分析: Delegation overhead = TLS savings (正味利益ゼロ) +- ✅ CURRENT_TASK.md 更新(Phase 17-1 結果 + Phase 17-2 計画) +- 🚧 次: Phase 17-2 専用 Backend 実装開始 diff --git a/Makefile b/Makefile index 9092b642..34cf067e 100644 --- a/Makefile +++ b/Makefile @@ -190,12 +190,12 @@ LDFLAGS += $(EXTRA_LDFLAGS) # Targets TARGET = test_hakmem -OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o hakmem_tiny_superslab.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_mid_mt.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_local_box.o core/box/free_remote_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/link_stubs.o core/tiny_failfast.o test_hakmem.o +OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o hakmem_tiny_superslab.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_mid_mt.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_local_box.o core/box/free_remote_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/link_stubs.o core/tiny_failfast.o test_hakmem.o OBJS = $(OBJS_BASE) # Shared library SHARED_LIB = libhakmem.so -SHARED_OBJS = hakmem_shared.o hakmem_config_shared.o hakmem_tiny_config_shared.o hakmem_ucb1_shared.o hakmem_bigcache_shared.o hakmem_pool_shared.o hakmem_l25_pool_shared.o hakmem_site_rules_shared.o hakmem_tiny_shared.o hakmem_tiny_superslab_shared.o core/box/superslab_expansion_box_shared.o core/box/integrity_box_shared.o core/box/mailbox_box_shared.o core/box/front_gate_box_shared.o core/box/free_local_box_shared.o core/box/free_remote_box_shared.o core/box/free_publish_box_shared.o core/box/capacity_box_shared.o core/box/carve_push_box_shared.o core/box/prewarm_box_shared.o tiny_sticky_shared.o tiny_remote_shared.o tiny_publish_shared.o tiny_debug_ring_shared.o hakmem_tiny_magazine_shared.o hakmem_tiny_stats_shared.o hakmem_tiny_sfc_shared.o hakmem_tiny_query_shared.o hakmem_tiny_rss_shared.o hakmem_tiny_registry_shared.o hakmem_tiny_remote_target_shared.o hakmem_tiny_bg_spill_shared.o tiny_adaptive_sizing_shared.o hakmem_mid_mt_shared.o hakmem_super_registry_shared.o hakmem_elo_shared.o hakmem_batch_shared.o hakmem_p2_shared.o hakmem_sizeclass_dist_shared.o hakmem_evo_shared.o hakmem_debug_shared.o hakmem_sys_shared.o hakmem_whale_shared.o hakmem_policy_shared.o hakmem_ace_shared.o hakmem_ace_stats_shared.o hakmem_ace_controller_shared.o hakmem_ace_metrics_shared.o hakmem_ace_ucb1_shared.o hakmem_prof_shared.o hakmem_learner_shared.o hakmem_size_hist_shared.o hakmem_learn_log_shared.o hakmem_syscall_shared.o tiny_fastcache_shared.o +SHARED_OBJS = hakmem_shared.o hakmem_config_shared.o hakmem_tiny_config_shared.o hakmem_ucb1_shared.o hakmem_bigcache_shared.o hakmem_pool_shared.o hakmem_l25_pool_shared.o hakmem_site_rules_shared.o hakmem_tiny_shared.o hakmem_tiny_superslab_shared.o hakmem_smallmid_shared.o core/box/superslab_expansion_box_shared.o core/box/integrity_box_shared.o core/box/mailbox_box_shared.o core/box/front_gate_box_shared.o core/box/free_local_box_shared.o core/box/free_remote_box_shared.o core/box/free_publish_box_shared.o core/box/capacity_box_shared.o core/box/carve_push_box_shared.o core/box/prewarm_box_shared.o tiny_sticky_shared.o tiny_remote_shared.o tiny_publish_shared.o tiny_debug_ring_shared.o hakmem_tiny_magazine_shared.o hakmem_tiny_stats_shared.o hakmem_tiny_sfc_shared.o hakmem_tiny_query_shared.o hakmem_tiny_rss_shared.o hakmem_tiny_registry_shared.o hakmem_tiny_remote_target_shared.o hakmem_tiny_bg_spill_shared.o tiny_adaptive_sizing_shared.o hakmem_mid_mt_shared.o hakmem_super_registry_shared.o hakmem_elo_shared.o hakmem_batch_shared.o hakmem_p2_shared.o hakmem_sizeclass_dist_shared.o hakmem_evo_shared.o hakmem_debug_shared.o hakmem_sys_shared.o hakmem_whale_shared.o hakmem_policy_shared.o hakmem_ace_shared.o hakmem_ace_stats_shared.o hakmem_ace_controller_shared.o hakmem_ace_metrics_shared.o hakmem_ace_ucb1_shared.o hakmem_prof_shared.o hakmem_learner_shared.o hakmem_size_hist_shared.o hakmem_learn_log_shared.o hakmem_syscall_shared.o tiny_fastcache_shared.o # Pool TLS Phase 1 (enable with POOL_TLS_PHASE1=1) ifeq ($(POOL_TLS_PHASE1),1) @@ -222,7 +222,7 @@ endif # Benchmark targets BENCH_HAKMEM = bench_allocators_hakmem BENCH_SYSTEM = bench_allocators_system -BENCH_HAKMEM_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o hakmem_tiny_superslab.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_mid_mt.o hakmem_super_registry.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_local_box.o core/box/free_remote_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/link_stubs.o core/tiny_failfast.o bench_allocators_hakmem.o +BENCH_HAKMEM_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o hakmem_tiny_superslab.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_mid_mt.o hakmem_super_registry.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_local_box.o core/box/free_remote_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/link_stubs.o core/tiny_failfast.o bench_allocators_hakmem.o BENCH_HAKMEM_OBJS = $(BENCH_HAKMEM_OBJS_BASE) ifeq ($(POOL_TLS_PHASE1),1) BENCH_HAKMEM_OBJS += pool_tls.o pool_refill.o pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o @@ -399,7 +399,7 @@ test-box-refactor: box-refactor ./larson_hakmem 10 8 128 1024 1 12345 4 # Phase 4: Tiny Pool benchmarks (properly linked with hakmem) -TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o hakmem_tiny_superslab.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_local_box.o core/box/free_remote_box.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_mid_mt.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/link_stubs.o core/tiny_failfast.o +TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o hakmem_tiny_superslab.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_local_box.o core/box/free_remote_box.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_mid_mt.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/link_stubs.o core/tiny_failfast.o TINY_BENCH_OBJS = $(TINY_BENCH_OBJS_BASE) ifeq ($(POOL_TLS_PHASE1),1) TINY_BENCH_OBJS += pool_tls.o pool_refill.o core/pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o diff --git a/core/box/hak_alloc_api.inc.h b/core/box/hak_alloc_api.inc.h index 74acc37e..20e38cfe 100644 --- a/core/box/hak_alloc_api.inc.h +++ b/core/box/hak_alloc_api.inc.h @@ -2,7 +2,8 @@ #ifndef HAK_ALLOC_API_INC_H #define HAK_ALLOC_API_INC_H -#include "../hakmem_tiny.h" // For tiny_get_max_size() (Phase 16) +#include "../hakmem_tiny.h" // For tiny_get_max_size() (Phase 16) +#include "../hakmem_smallmid.h" // For Small-Mid Front Box (Phase 17-1) #ifdef HAKMEM_POOL_TLS_PHASE1 #include "../pool_tls.h" @@ -31,8 +32,29 @@ inline void* hak_alloc_at(size_t size, hak_callsite_t site) { uintptr_t site_id = (uintptr_t)site; + // Phase 17-1: Small-Mid Front Box (256B-1KB) - TRY FIRST! + // Strategy: Thin TLS cache layer, no backend (falls through on miss) + // ENV: HAKMEM_SMALLMID_ENABLE=1 to enable (default: OFF) + // CRITICAL: Must come BEFORE Tiny to avoid routing conflict + // When enabled, auto-adjusts Tiny to C0-C5 (0-255B only) + if (smallmid_is_enabled() && smallmid_is_in_range(size)) { +#if HAKMEM_DEBUG_TIMING + HKM_TIME_START(t_smallmid); +#endif + void* sm_ptr = smallmid_alloc(size); +#if HAKMEM_DEBUG_TIMING + HKM_TIME_END(HKM_CAT_TINY_ALLOC, t_smallmid); +#endif + if (sm_ptr) { + hkm_ace_track_alloc(); + return sm_ptr; + } + // TLS miss: Fall through to Mid/ACE (Tiny skipped due to auto-adjust) + } + // Phase 16: Dynamic Tiny max size (ENV: HAKMEM_TINY_MAX_CLASS) - // Default: 1023B (C0-C7), can be reduced to 255B (C0-C5) to delegate 512/1024B to Mid + // Default: 1023B (C0-C7), reduced to 255B (C0-C5) when Small-Mid enabled + // Phase 17-1: Auto-adjusted to avoid overlap with Small-Mid if (__builtin_expect(size <= tiny_get_max_size(), 1)) { #if HAKMEM_DEBUG_TIMING HKM_TIME_START(t_tiny); diff --git a/core/hakmem_smallmid.c b/core/hakmem_smallmid.c index 21ef5648..8de3ff76 100644 --- a/core/hakmem_smallmid.c +++ b/core/hakmem_smallmid.c @@ -170,40 +170,84 @@ static inline bool smallmid_tls_push(int class_idx, void* ptr) { } // ============================================================================ -// Backend: Use Tiny Allocator APIs (Phase 17-1) +// Backend Delegation (Phase 17-1: Reuse Tiny infrastructure) // ============================================================================ /** - * smallmid_backend_alloc - Allocate from Tiny backend + * smallmid_backend_alloc - Allocate from Tiny backend and convert header * - * @param size Allocation size - * @return Allocated pointer (user pointer, no Small-Mid header) + * @param size Allocation size (256-1024) + * @return User pointer with Small-Mid header (0xb0), or NULL on failure * - * Phase 17-1: Delegate to existing Tiny allocator infrastructure - * This reuses Tiny's SuperSlab/SharedPool without building dedicated backend + * Strategy: + * - Call Tiny allocator (handles C5/C6/C7 = 256B/512B/1KB) + * - Tiny writes header: 0xa5/0xa6/0xa7 + * - Overwrite with Small-Mid header: 0xb0/0xb1/0xb2 */ -static inline void* smallmid_backend_alloc(size_t size) { +static void* smallmid_backend_alloc(size_t size) { #ifdef HAKMEM_SMALLMID_STATS __atomic_fetch_add(&g_smallmid_stats.tls_misses, 1, __ATOMIC_RELAXED); + __atomic_fetch_add(&g_smallmid_stats.superslab_refills, 1, __ATOMIC_RELAXED); #endif - // Call Tiny allocator (reuses existing SuperSlab/SharedPool) + // Call Tiny allocator void* ptr = hak_tiny_alloc(size); - SMALLMID_LOG("smallmid_backend_alloc(%zu) = %p (via Tiny)", size, ptr); + if (!ptr) { + SMALLMID_LOG("smallmid_backend_alloc(%zu): Tiny allocation failed", size); + return NULL; + } + + // Overwrite header: Tiny (0xa0 | tiny_class) → Small-Mid (0xb0 | sm_class) + // Tiny class mapping: C5=256B, C6=512B, C7=1KB + // Small-Mid class mapping: SM0=256B, SM1=512B, SM2=1KB + uint8_t* base = (uint8_t*)ptr - 1; + uint8_t tiny_header = *base; + uint8_t tiny_class = tiny_header & 0x0f; + + // Convert Tiny class (5/6/7) to Small-Mid class (0/1/2) + int sm_class = tiny_class - 5; + if (sm_class < 0 || sm_class >= SMALLMID_NUM_CLASSES) { + // Should never happen - Tiny allocated wrong class + SMALLMID_LOG("smallmid_backend_alloc(%zu): Invalid Tiny class %d", size, tiny_class); + // Revert header and free + hak_tiny_free(ptr); + return NULL; + } + + // Write Small-Mid header + *base = 0xb0 | sm_class; + + SMALLMID_LOG("smallmid_backend_alloc(%zu) = %p (Tiny C%d → SM C%d)", size, ptr, tiny_class, sm_class); return ptr; } /** - * smallmid_backend_free - Free to Tiny backend + * smallmid_backend_free - Convert header and delegate to Tiny backend * - * @param ptr User pointer (no Small-Mid header) - * @param size Allocation size + * @param ptr User pointer (must have Small-Mid header 0xb0) + * @param size Allocation size (unused, Tiny reads header) * - * Phase 17-1: Delegate to existing Tiny allocator infrastructure + * Strategy: + * - Convert header: Small-Mid (0xb0 | sm_class) → Tiny (0xa0 | tiny_class) + * - Call Tiny free to handle deallocation */ -static inline void smallmid_backend_free(void* ptr, size_t size) { - (void)size; // Unused: Tiny free reads header, doesn't need size - SMALLMID_LOG("smallmid_backend_free(%p) (via Tiny)", ptr); +static void smallmid_backend_free(void* ptr, size_t size) { + (void)size; // Unused - Tiny reads size from header + + // Read Small-Mid header + uint8_t* base = (uint8_t*)ptr - 1; + uint8_t sm_header = *base; + uint8_t sm_class = sm_header & 0x0f; + + // Convert Small-Mid class (0/1/2) to Tiny class (5/6/7) + uint8_t tiny_class = sm_class + 5; + + // Write Tiny header + *base = 0xa0 | tiny_class; + + SMALLMID_LOG("smallmid_backend_free(%p): SM C%d → Tiny C%d", ptr, sm_class, tiny_class); + + // Call Tiny free hak_tiny_free(ptr); } @@ -247,16 +291,16 @@ void* smallmid_alloc(size_t size) { return (uint8_t*)ptr + 1; // Return user pointer (skip header) } - // Slow path: Allocate from Tiny backend (no refill, direct delegation) - // Phase 17-1: Simplified - no TLS refill, just pass through to Tiny - void* backend_ptr = smallmid_backend_alloc(size); - if (!backend_ptr) { - SMALLMID_LOG("smallmid_alloc(%zu) = NULL (backend alloc failed)", size); + // TLS miss: Allocate from Tiny backend + // Phase 17-1: Reuse Tiny infrastructure (C5/C6/C7) instead of dedicated SuperSlab + ptr = smallmid_backend_alloc(size); + if (!ptr) { + SMALLMID_LOG("smallmid_alloc(%zu) = NULL (backend failed)", size); return NULL; } - SMALLMID_LOG("smallmid_alloc(%zu) = %p (backend alloc, class=%d)", size, backend_ptr, class_idx); - return backend_ptr; // Backend returns user pointer directly + SMALLMID_LOG("smallmid_alloc(%zu) = %p (backend alloc, class=%d)", size, ptr, class_idx); + return ptr; } // ============================================================================ diff --git a/core/hakmem_tiny.c b/core/hakmem_tiny.c index 4db9d1b5..31f66f8e 100644 --- a/core/hakmem_tiny.c +++ b/core/hakmem_tiny.c @@ -50,10 +50,15 @@ const size_t g_tiny_class_sizes[TINY_NUM_CLASSES] = { // ============================================================================ // Phase 16: Dynamic Tiny Max Size (ENV: HAKMEM_TINY_MAX_CLASS) +// Phase 17-1: Auto-adjust when Small-Mid enabled // ============================================================================ +// Forward declaration for Small-Mid check +extern bool smallmid_is_enabled(void); + // Get dynamic max size for Tiny allocator based on ENV configuration // Default: 1023B (C0-C7), can be reduced to 255B (C0-C5) +// Phase 17-1: Auto-reduces to 255B when Small-Mid is enabled size_t tiny_get_max_size(void) { static int g_max_class = -1; if (__builtin_expect(g_max_class == -1, 0)) { @@ -70,12 +75,19 @@ size_t tiny_get_max_size(void) { } } + // Phase 17-1: Auto-adjust when Small-Mid enabled + // Small-Mid handles 256B-1KB, so Tiny should only handle 0-255B + int effective_class = g_max_class; + if (smallmid_is_enabled() && effective_class > 5) { + effective_class = 5; // Limit to C0-C5 (0-255B) + } + // Map class to max usable size (stride - 1) // C0=8B, C1=16B, C2=32B, C3=64B, C4=128B, C5=256B, C6=512B, C7=1024B static const size_t class_to_max_size[TINY_NUM_CLASSES] = { 7, 15, 31, 63, 127, 255, 511, 1023 }; - return class_to_max_size[g_max_class]; + return class_to_max_size[effective_class]; } // ============================================================================ diff --git a/hakmem.d b/hakmem.d index 30aef2cd..624751bd 100644 --- a/hakmem.d +++ b/hakmem.d @@ -17,7 +17,8 @@ hakmem.o: core/hakmem.c core/hakmem.h core/hakmem_build_flags.h \ core/hakmem_ace_metrics.h core/hakmem_ace_ucb1.h core/ptr_trace.h \ core/box/hak_exit_debug.inc.h core/box/hak_kpi_util.inc.h \ core/box/hak_core_init.inc.h core/hakmem_phase7_config.h \ - core/box/hak_alloc_api.inc.h core/box/hak_free_api.inc.h \ + core/box/hak_alloc_api.inc.h core/box/../hakmem_tiny.h \ + core/box/../hakmem_smallmid.h core/box/hak_free_api.inc.h \ core/hakmem_tiny_superslab.h core/box/../tiny_free_fast_v2.inc.h \ core/box/../tiny_region_id.h core/box/../hakmem_build_flags.h \ core/box/../tiny_box_geometry.h \ @@ -83,6 +84,8 @@ core/box/hak_kpi_util.inc.h: core/box/hak_core_init.inc.h: core/hakmem_phase7_config.h: core/box/hak_alloc_api.inc.h: +core/box/../hakmem_tiny.h: +core/box/../hakmem_smallmid.h: core/box/hak_free_api.inc.h: core/hakmem_tiny_superslab.h: core/box/../tiny_free_fast_v2.inc.h: diff --git a/hakmem_mid_mt.d b/hakmem_mid_mt.d index 0336a756..4bfe7fbe 100644 --- a/hakmem_mid_mt.d +++ b/hakmem_mid_mt.d @@ -1,2 +1,8 @@ -hakmem_mid_mt.o: core/hakmem_mid_mt.c core/hakmem_mid_mt.h +hakmem_mid_mt.o: core/hakmem_mid_mt.c core/hakmem_mid_mt.h \ + core/hakmem_tiny.h core/hakmem_build_flags.h core/hakmem_trace.h \ + core/hakmem_tiny_mini_mag.h core/hakmem_mid_mt.h: +core/hakmem_tiny.h: +core/hakmem_build_flags.h: +core/hakmem_trace.h: +core/hakmem_tiny_mini_mag.h: diff --git a/hakmem_smallmid.d b/hakmem_smallmid.d new file mode 100644 index 00000000..48c77148 --- /dev/null +++ b/hakmem_smallmid.d @@ -0,0 +1,15 @@ +hakmem_smallmid.o: core/hakmem_smallmid.c core/hakmem_smallmid.h \ + core/hakmem_build_flags.h core/hakmem_tiny.h core/hakmem_trace.h \ + core/hakmem_tiny_mini_mag.h core/tiny_region_id.h \ + core/tiny_box_geometry.h core/hakmem_tiny_superslab_constants.h \ + core/hakmem_tiny_config.h core/ptr_track.h +core/hakmem_smallmid.h: +core/hakmem_build_flags.h: +core/hakmem_tiny.h: +core/hakmem_trace.h: +core/hakmem_tiny_mini_mag.h: +core/tiny_region_id.h: +core/tiny_box_geometry.h: +core/hakmem_tiny_superslab_constants.h: +core/hakmem_tiny_config.h: +core/ptr_track.h: