From b2724e6f5d848dc96dba4b4b75b332e891e95ca1 Mon Sep 17 00:00:00 2001 From: "Moe Charm (CI)" Date: Sat, 13 Dec 2025 05:10:45 +0900 Subject: [PATCH] Phase ALLOC-TINY-FAST-DUALHOT-1: WIP (regression), FREE DUALHOT confirmed +13% MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit **ALLOC-TINY-FAST-DUALHOT-1** (this phase): - Implementation: malloc_tiny_fast() C0-C3 early-exit with policy snapshot skip - ENV: HAKMEM_TINY_ALLOC_DUALHOT=0/1 (default OFF) - A/B Result: -1.17% median regression (Mixed, 10-run) - Root Cause: Branch prediction penalty on C4-C7 outweighs policy skip benefit - Decision: Freeze as research box (default OFF) - Difference from FREE: ALLOC requires structural changes (per-class paths) **FREE-TINY-FAST-DUALHOT-1** (verified): - A/B Confirmation: +13.00% improvement (42.08M → 47.81M ops/s, Mixed, 10-run) - Success Criteria: +2% target ACHIEVED - Health Check: PASS (verify_health_profiles.sh, ENV OFF/ON) - Safety: HAKMEM_TINY_LARSON_FIX guard in place - Decision: Promotion to MIXED_TINYV3_C7_SAFE profile candidate **Next Steps**: - Profile adoption of FREE DUALHOT for MIXED workload - No further deep-dive on ALLOC optimization (deferred to future phases) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- CURRENT_TASK.md | 11 +++++++ core/front/malloc_tiny_fast.h | 17 ++++++---- .../ALLOC_TINY_FAST_DUALHOT_1_DESIGN.md | 31 +++++++++++++++++++ 3 files changed, 53 insertions(+), 6 deletions(-) diff --git a/CURRENT_TASK.md b/CURRENT_TASK.md index 5ae0b9b4..be0f0b9b 100644 --- a/CURRENT_TASK.md +++ b/CURRENT_TASK.md @@ -27,6 +27,17 @@ DUALHOT optimized の perf で **alloc 側が次のボトルネック**に移行 2) `core/front/malloc_tiny_fast.h` の `malloc_tiny_fast()` に `class_idx<=3` early-exit を追加 3) health + 10-run A/B(Mixed / C6-heavy) +### Status: Phase ALLOC-TINY-FAST-DUALHOT-1 FROZEN ✅ (2025-12-13) + +- **Safety**: health(ENV OFF/ON)PASS +- **Mixed A/B(10-run, iter=100M, ws=400)**: median **-1.17%**(許容範囲内だが勝ち筋ではない) +- **C6-heavy A/B(10-run, 10M ops)**: ±1% 程度でニュートラル +- **Decision**: default OFF のまま freeze(opt-in 研究用) + +次の攻め先(候補): +- `malloc` / Front Gate の “構造的” オーバーヘッド(PGO/定数化・include/inline の整理で枝を消す) +- Free 側は `FREE-TINY-FAST-DUALHOT-1` の昇格手順(HOTCOLD=1 前提のため、標準プロファイル採用の可否を決める) + --- ## 前フェーズ: Phase POOL-MID-DN-BATCH 完了 ✅(研究箱として freeze 推奨) diff --git a/core/front/malloc_tiny_fast.h b/core/front/malloc_tiny_fast.h index a2d70653..5df5f838 100644 --- a/core/front/malloc_tiny_fast.h +++ b/core/front/malloc_tiny_fast.h @@ -169,13 +169,18 @@ static inline void* malloc_tiny_fast(size_t size) { // Phase ALLOC-TINY-FAST-DUALHOT-1: C0-C3 direct path (second hot path) // Skip expensive policy snapshot and route determination for C0-C3. // Measurements show C0-C3 is 48% of allocations, not rare. - if (__builtin_expect(alloc_dualhot_enabled() && class_idx <= 3, 0)) { - // Direct to LEGACY unified cache (no policy snapshot) - void* ptr = tiny_hot_alloc_fast(class_idx); - if (TINY_HOT_LIKELY(ptr != NULL)) { - return ptr; + // NOTE: + // Keep the default path unchanged (gate OFF) to avoid overhead. + // When gate ON, treat C0-C3 as "second hot path" (likely taken in Mixed). + if (__builtin_expect(alloc_dualhot_enabled(), 0)) { + if (TINY_HOT_LIKELY(class_idx <= 3)) { + // Direct to LEGACY unified cache (no policy snapshot) + void* ptr = tiny_hot_alloc_fast(class_idx); + if (TINY_HOT_LIKELY(ptr != NULL)) { + return ptr; + } + return tiny_cold_refill_and_alloc(class_idx); } - return tiny_cold_refill_and_alloc(class_idx); } // 2. Policy snapshot (TLS cached, single read) diff --git a/docs/analysis/ALLOC_TINY_FAST_DUALHOT_1_DESIGN.md b/docs/analysis/ALLOC_TINY_FAST_DUALHOT_1_DESIGN.md index 72e1300a..49cd8734 100644 --- a/docs/analysis/ALLOC_TINY_FAST_DUALHOT_1_DESIGN.md +++ b/docs/analysis/ALLOC_TINY_FAST_DUALHOT_1_DESIGN.md @@ -103,3 +103,34 @@ switch (policy->route_kind[class_idx]) { ... } - C6-heavy: ±2% 以内(回帰なし) - 回帰が出たら default OFF の研究箱として freeze(保持して次の学びに使う) +--- + +## 実測結果(2025-12-13) + +計測条件: +- Mixed: `HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE ./bench_random_mixed_hakmem 100000000 400 1`(10-run) +- C6-heavy: `HAKMEM_PROFILE=C6_HEAVY_LEGACY_POOLV1 ./bench_mid_large_mt_hakmem 1 10000000 400 1`(10-run) + +結果: + +### Mixed(MIXED_TINYV3_C7_SAFE) + +- Baseline(`HAKMEM_TINY_ALLOC_DUALHOT=0`): mean=50.60M, median=50.87M ops/s +- Opt(`HAKMEM_TINY_ALLOC_DUALHOT=1`): mean=50.27M, median=50.28M ops/s +- 差分(median): **-1.17%**(許容範囲内だが “勝ち筋” ではない) + +### C6-heavy(C6_HEAVY_LEGACY_POOLV1) + +- Baseline(`HAKMEM_TINY_ALLOC_DUALHOT=0`): mean=24.73M, median=24.69M ops/s +- Opt(`HAKMEM_TINY_ALLOC_DUALHOT=1`): mean=24.62M, median=24.78M ops/s +- 差分(median): **+0.36%**(実質ニュートラル) + +## 判定 + +- ✅ Gate 1: health(ENV OFF/ON)PASS +- ✅ Gate 2: 性能(±2% 以内)PASS +- ❌ Gate 3: Mixed の +2% は未達 + +結論: +- **default OFF の研究箱として freeze**(保持はするが、標準プロファイルでは有効化しない) +- 次に alloc を攻めるなら「C0–C3 だけ」ではなく、`malloc`/front-gate まわりの構造的オーバーヘッドを狙う(別フェーズに切る)