Files
hakmem/docs/analysis/PHASE2_STRUCTURAL_CHANGES_NEXT_INSTRUCTIONS.md
Moe Charm (CI) d54893ea1d Phase 3 C3: Static Routing A/B Test ADOPT (+2.20% Mixed gain)
Step 2 & 3 Complete:
- A/B test (Mixed 10-run): STATIC_ROUTE=0 (38.91M) → =1 (39.77M) = +2.20% avg
  - Median gain: +1.98%
  - Result:  GO (exceeds +1.0% threshold)

- Decision:  ADOPT into MIXED_TINYV3_C7_SAFE preset
  - bench_profile.h line 77: HAKMEM_TINY_STATIC_ROUTE=1 default
  - Learner auto-disables static route when HAKMEM_SMALL_LEARNER_V7_ENABLED=1

Implementation Summary:
- core/box/tiny_static_route_box.{h,c}: Research box (Step 1A)
- core/front/malloc_tiny_fast.h: Route lookup integration (Step 1B, lines 249-256)
- core/bench_profile.h: Bench sync + preset adoption

Cumulative Phase 2-3 Gains:
- B3 (Routing shape): +2.89%
- B4 (Wrapper split): +1.47%
- C3 (Static routing): +2.20%
- Total: ~6.8% (35.2M → ~39.8M ops/s)

Next: Phase 3 C1 (TLS Prefetch, expected +2-4%)

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-13 18:46:11 +09:00

82 lines
2.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 2: 構造修正 完了レポート2025-12-13
## 概要
Phase 2 では **wrapper 層とアロケーション経路のルーティング形** に焦点を当て、3つのサブフェーズB1, B3, B4を実装。最終的に **B3 + B4 の組み合わせで +4.4% の改善** を達成。
## 実装内容
### B1: Header Tax 削減 v2 ❌ NO-GO
**狙い**: Header write の条件分岐を削減HEADER_MODE=LIGHT
**結果**:
- Mixed (10-run): 48.89M → 47.65M ops/s (**-2.54%**, regression)
- 理由: 条件チェックオーバーヘッドが memory store 削減効果を上回る
- 決定: **FREEZE** (research box, ENV opt-in)
### B3: Routing 分岐形最適化 ✅ ADOPT
**狙い**: malloc_tiny_fast() 内の rare routeV7, MID, ULTRAを noinline,cold へ
**実装**:
- core/front/malloc_tiny_fast.h:252-267 で HAKMEM_TINY_ALLOC_ROUTE_SHAPE dispatch
- Hot path: LIKELY on LEGACY C0-C7 の大多数)
- Cold path: V7/MID/ULTRA 分岐を cold 領域へ
**結果**:
- Mixed (10-run): 48.41M → 49.80M ops/s (**+2.89%**, win)
- C6-heavy (5-run): 8.97M → 9.79M ops/s (**+9.13%**, strong)
- 決定: **ADOPT as default** in `MIXED_TINYV3_C7_SAFE` / `C6_HEAVY_LEGACY_POOLV1`
### B4: Wrapper Layer Hot/Cold Split ✅ ADOPT
**狙い**: wrapper 入口の "稀なチェック"LD mode、jemalloc、force_libc、診断を noinline,cold へ
**実装**:
- malloc_cold() (noinline,cold): LD mode、jemalloc、force_libc、BenchFast、init wait 処理
- malloc() hot/cold dispatch: HAKMEM_WRAP_SHAPE=1 ENV gate
- free_cold() (noinline,cold): pointer 分類、ownership check、header validation、全フォールバック
- free() hot/cold dispatch: BenchFast → Tiny fast → free_cold() 委譲
**結果**:
- Mixed (10-run): 34,750,578 → 35,262,596 ops/s (**+1.47%**, average)
- 決定: **ADOPT as default** in `MIXED_TINYV3_C7_SAFE``HAKMEM_WRAP_SHAPE=1`
## 累積効果
```
Phase 2 Combined (B3 + B4):
B3 routing shape: +2.89%
B4 wrapper shape: +1.47%
─────────────────────────
Estimated total: ~+4.4%
```
## 重要な同期機構
**bench_profile での ENV 設定の反映**:
```
core/bench_profile.h:9 → wrapper_env_box.h インクルード
core/box/wrapper_env_box.c:49-64 → wrapper_env_refresh_from_env() 実装
```
bench_profile() 後に wrapper_env_refresh_from_env() を呼び出すことで、
putenv() が wrapper 側の ENV キャッシュに反映される。
## 次フェーズ: Phase 3 計画
目標: Cache Locality 最適化 (+12-22%)
**C3優先度: 最高)**: Static Routing
- perf top で hot spot を特定
- malloc_tiny_fast() の policy_snapshot を bypass
- 期待: +5-8%
**C1/C2**: TLS prefetch + metadata cache optimization
- 期待: +2-4% + 5-10%
---
次の担当者へ: C3 から着手推奨。