Phase 35-39: FAST build optimization complete (+7.13% cumulative)

Phase 35-A: BENCH_MINIMAL gate function elimination (GO +4.39%) - tiny_front_v3_enabled() → constant true - tiny_metadata_cache_enabled() → constant 0 - learner_v7_enabled() → constant false - small_learner_v2_enabled() → constant false Phase 36: Policy snapshot init-once (GO +0.71%) - small_policy_v7_snapshot() version check skip in BENCH_MINIMAL - TLS cache for policy snapshot Phase 37: Standard TLS cache (NO-GO -0.07%) - TLS cache for Standard build attempted - Runtime gate overhead negates benefit Phase 38: FAST/OBSERVE/Standard workflow established - make perf_fast, make perf_observe targets - Scorecard and documentation updates Phase 39: Hot path gate constantization (GO +1.98%) - front_gate_unified_enabled() → constant 1 - alloc_dualhot_enabled() → constant 0 - g_bench_fast_front, g_v3_enabled blocks → compile-out - free_dispatch_stats_enabled() → constant false Results: - FAST v3: 56.04M ops/s (47.4% of mimalloc) - Standard: 53.50M ops/s (45.3% of mimalloc) - M1 target (50%): 5.5% remaining 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-16 15:01:56 +09:00
parent 506e724c3b
commit b7085c47e1
22 changed files with 1550 additions and 397 deletions
--- a/CURRENT_TASK.md
+++ b/CURRENT_TASK.md
@ -1,362 +1,44 @@
-# 本線タスク（現在）
+# CURRENT_TASK（Rolling）

-## 現在の状態（要約）
+## 0) 今の「正」（Phase 39）

- **安定版（本線）**: Phase 31 完了（g_tiny_free_trace compile-out） — NEUTRAL verdict、code cleanliness で採用
- **直近の判断**:
-  - Phase 24（OBSERVE 税 prune / tiny_class_stats）: ✅ GO (+0.93%)
-  - Phase 25（Free Stats atomic prune / g_free_ss_enter）: ✅ GO (+1.07%)
-  - Phase 26（Hot path diagnostic atomics prune / 5 atomics）: ⚪ NEUTRAL (-0.33%, code cleanliness で採用)
-  - Phase 27（Unified Cache Stats atomic prune / 6 atomics）: ✅ GO (+0.74% mean, +1.01% median)
-  - Phase 28（Background Spill Queue audit / 8 atomics）: ⚪ NO-OP (全て CORRECTNESS)
-  - Phase 29（Pool Hotbox v2 Stats audit / 12 atomics）: ⚪ NO-OP (ENV-gated, 実行されない)
-  - Phase 30（Standard Procedure Documentation）: ✅ PROCEDURE COMPLETE (412 atomics 監査完了)
-  - Phase 31（Tiny Free Trace atomic prune / g_tiny_free_trace）: ⚪ NEUTRAL (-0.35%, code cleanliness で採用)
- **計測の正**: `scripts/run_mixed_10_cleanenv.sh`（同一バイナリ / clean env / 10-run）
- **累積効果**: **+2.74%** (Phase 24: +0.93% + Phase 25: +1.07% + Phase 26: NEUTRAL + Phase 27: +0.74% + Phase 28: NO-OP + Phase 29: NO-OP + Phase 30: PROCEDURE + Phase 31: NEUTRAL)
- **目標/現状スコアカード**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md`
+- **性能比較の正**: **FAST build**（`make perf_fast`）
+- **安全・互換の正**: Standard build（`make bench_random_mixed_hakmem`）
+- **観測の正**: OBSERVE build（`make perf_observe`）
+- **スコアカード**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md`
+- **計測の正（Mixed 10-run）**: `scripts/run_mixed_10_cleanenv.sh`（`ITERS=20000000 WS=400`）

-## 原則（Box Theory 運用ルール）
+## 1) 現状（最新スナップショット）
+
+- FAST v3: **56.04M ops/s**（mimalloc の **47.4%**）
+- Standard: **53.50M ops/s**（mimalloc の **45.3%**）
+
+※詳細は `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md` を正とする（ここは要点だけ）。
+
+## 2) 原則（Box Theory 運用）

 - 変更は箱で分ける（ENV / build flag で戻せる）
- 変換点（境界）は 1 箇所に集約する
- "削除して速くする" は危険（layout/LTO で反転する）
-  - ✅ compile-out（`#if HAKMEM_*_COMPILED`）は許容
-  - ❌ link-out（Makefile から `.o` を外す）は封印（Phase 22-2 NO-GO）
- **Atomic 監査原則**（Phase 30 標準化）:
-  - **Step 0: 実行確認（MANDATORY）**: ENV gate / 実行カウンタ確認（Phase 29 教訓）
-  - **Step 1: CORRECTNESS vs TELEMETRY 分類**: `if` 条件 = CORRECTNESS（Phase 28 教訓）
-  - **Step 2: Compile-out 実装**: `#if HAKMEM_*_COMPILED` で wrap
-  - **Step 3: A/B test**: Baseline vs Compiled-in（10-run 比較）
-  - **Verdict**: GO (+0.5%+), NEUTRAL (±0.5%), NO-GO (-0.5%+)
+- 境界は 1 箇所（変換点を増やさない）
+- **削除して速くする（link-out / 大きい削除）は封印**（layout/LTO で符号反転する）
+  - ✅ compile-out（`#if HAKMEM_*_COMPILED` / `#if HAKMEM_BENCH_MINIMAL`）は許容
+  - ❌ Makefile から `.o` を外す / コード物理削除は原則しない（Phase 22-2 NO-GO）
+- A/B は **同一バイナリ**でトグル（ENV / build flag）。別バイナリ比較は layout が混ざる。

-## Phase 30 完了（2025-12-16）
+## 3) 次の指示書

-### 実施内容
+TBD（Phase 39 完了）

-**目的:** Phase 24-29 の学びを 4-step 標準手順として固定化し、Phase 31 候補を選定する。
+## 4) 直近のログ（要点だけ）

-**成果物:**
-1. `docs/analysis/PHASE30_STANDARD_PROCEDURE.md` - 4-step 標準手順書
-2. `docs/analysis/ATOMIC_AUDIT_FULL.txt` - 全 atomic 監査結果（412 atomics）
-3. `docs/analysis/PHASE31_CANDIDATES_HOT.txt` - HOT path 候補抽出
-4. `docs/analysis/PHASE31_CANDIDATES_WARM.txt` - WARM path 候補抽出
-5. `docs/analysis/PHASE31_RECOMMENDED_CANDIDATES.md` - Phase 31 推奨候補（TOP 3）
-6. `docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md` 更新（Phase 30 追記）
+- Phase 24–34: atomic prune 累積 **+2.74%**（その後 diminishing returns）
+- Phase 35-A: `HAKMEM_BENCH_MINIMAL=1`（gate prune）**GO +4.39%**
+- Phase 36: FAST-only policy snapshot 最適化 **GO +0.71%**
+- Phase 37: Standard TLS cache **NO-GO**（runtime gate の税が勝つ）
+- Phase 38: FAST/OBSERVE/Standard 運用確立（scorecard + Makefile targets）
+- Phase 39: FAST v3 gate 定数化 **GO +1.98%**
+  - 結果詳細: `docs/analysis/PHASE39_FAST_V3_GATE_CONSTANTIZATION_RESULTS.md`

-### 監査結果
+## 5) アーカイブ

-**全 atomic 監査:**
- **Total atomics:** 412
- **TELEMETRY:** 104 (25%)
- **CORRECTNESS:** 24 (6%)
- **UNKNOWN:** 284 (69%, manual review needed)
+- 旧 `CURRENT_TASK.md`（詳細ログ）は `archive/CURRENT_TASK_ARCHIVE_20251216.md`

-**Path 分類:**
- **HOT path:** 16 atomics (5 TELEMETRY, 11 UNKNOWN)
- **WARM path:** 10 atomics (3 TELEMETRY, 7 UNKNOWN)
- **COLD path:** 386 atomics (remaining)
-
-**NEW 候補（未コンパイルアウト）:**
- **HOT path:** 1 candidate (`g_tiny_free_trace`)
- **WARM path:** 3 candidates (`rel_logs`, `dbg_logs`, `g_p0_class_oob_log`)
-
-### Step 0 実行確認結果
-
-**HOT path:**
-1. `g_tiny_free_trace` (HOT, TELEMETRY)
-   - ✅ ENV gate なし
-   - ✅ `hak_tiny_free()` で実行（毎回）
-   - ✅ Execution verified
-   - **Verdict:** **TOP PRIORITY for Phase 31**
-
-**WARM path:**
-1. `rel_logs` + `dbg_logs` (WARM, TELEMETRY)
-   - ❌ ENV gated by `HAKMEM_TINY_WARM_LOG` (OFF by default)
-   - ❌ 実行されない（Phase 29 pattern）
-   - **Verdict:** SKIP
-
-2. `g_p0_class_oob_log` (WARM, TELEMETRY)
-   - ✅ ENV gate なし
-   - ⚠️ Error path（out-of-bounds class index）
-   - ❓ 実行頻度不明（要検証）
-   - **Verdict:** LOW PRIORITY（Phase 32 候補）
-
-### 4-Step Standard Procedure
-
-**Phase 30 で確立された型:**
-
-**Step 0: 実行確認（NEW - Phase 29 教訓）**
- ENV gate チェック（`rg "getenv.*FEATURE" core/`）
- 実行カウンタ確認（Mixed 10-run で > 0）
- perf/flamegraph 検証（オプション）
- **Decision:** ❌ 実行されない → SKIP
-
-**Step 1: CORRECTNESS/TELEMETRY 分類（Phase 28 教訓）**
- 全使用箇所を追跡（`rg -n "g_variable" core/`）
- `if` 条件で使用 → CORRECTNESS（DO NOT TOUCH）
- `fprintf/fprintf` のみ → TELEMETRY（compile-out 候補）
- **Decision:** CORRECTNESS → DO NOT TOUCH
-
-**Step 2: Compile-Out 実装（Phase 24-27 pattern）**
- `hakmem_build_flags.h` に gate 追加
- TELEMETRY atomic を `#if` で wrap
- Build-level compile-out（link-out 禁止）
-
-**Step 3: A/B Test（build-level comparison）**
- Baseline (COMPILED=0): default build
- Compiled-in (COMPILED=1): research build
- **Verdict:** GO (+0.5%+), NEUTRAL (±0.5%), NO-GO (-0.5%+)
-
-### 判定
-
-**PROCEDURE COMPLETE** ✅
-
-**理由:**
- 4-step procedure 確立（Phase 24-29 学習を体系化）
- Step 0 (実行確認) が Phase 29 空振りを防ぐ
- 全 atomic 監査完了（412 atomics）
- Phase 31 候補選定完了（TOP 1: `g_tiny_free_trace`）
-
-### ドキュメント
-
- `docs/analysis/PHASE30_STANDARD_PROCEDURE.md` (標準手順書)
- `docs/analysis/ATOMIC_AUDIT_FULL.txt` (全 atomic 監査結果)
- `docs/analysis/PHASE31_RECOMMENDED_CANDIDATES.md` (Phase 31 候補 TOP 3)
- `docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md` (Phase 24-30 総括)
-
-### 教訓
-
-**空振り防止 3 原則:**
-1. **Step 0 は必須ゲート**: ENV-gated コードは最初に弾く（Phase 29 教訓）
-2. **カウンタ名 ≠ 用途**: Flow control か telemetry か全使用箇所で確認（Phase 28 教訓）
-3. **HOT path 優先**: 実行頻度が性能影響を決める（Phase 24-27 教訓）
-
-## 累積効果（Phase 24+25+26+27+28+29+30+31）
-
-| Phase | Target | Impact | Status |
-|-------|--------|--------|--------|
-| **24** | `g_tiny_class_stats_*` (5 atomics) | **+0.93%** | GO ✅ |
-| **25** | `g_free_ss_enter` (1 atomic) | **+1.07%** | GO ✅ |
-| **26** | Hot path diagnostics (5 atomics) | **-0.33%** | NEUTRAL ✅ |
-| **27** | `g_unified_cache_*` (6 atomics) | **+0.74%** | GO ✅ |
-| **28** | Background Spill Queue (8 atomics) | **N/A** | NO-OP ✅ |
-| **29** | Pool Hotbox v2 Stats (12 atomics) | **0.00%** | NO-OP ✅ |
-| **30** | Standard Procedure (412 atomic audit) | **N/A** | PROCEDURE ✅ |
-| **31** | `g_tiny_free_trace` (1 atomic) | **-0.35%** | NEUTRAL ✅ |
-| **合計** | **18 atomics removed, 412 audited** | **+2.74%** | **✅** |
-
-**Key Insight:** 標準手順が次の Phase の成功確率を上げる。
- Step 0 (実行確認) で ENV-gated code を弾く → Phase 29 空振りを防止
- Step 1 (分類) で CORRECTNESS を弾く → Phase 28 誤判定を防止
- HOT path 優先 → Phase 24-27 成功パターン（+0.5~1.0%）
- **NEW:** NEUTRAL verdict でも code cleanliness で採用可 → Phase 26/31 パターン
-
-## Phase 31: g_tiny_free_trace compile-out 完了（2025-12-16）
-
-### 実施内容
-
-**目的:** `hak_tiny_free()` 先頭の trace-rate-limit atomic を compile-out（default）して固定税を削る。
-
-**成果物:**
-1. `docs/analysis/PHASE31_TINY_FREE_TRACE_ATOMIC_PRUNE_RESULTS.md` - A/B test results + NEUTRAL verdict
-2. `docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md` 更新（Phase 31 追記）
-3. `CURRENT_TASK.md` 更新（Phase 31 完了 + Phase 32 候補提示）
-
-### A/B Test 結果
-
-**Baseline (COMPILED=0, trace compiled-out):**
- Mean: 53.64 M ops/s
- Median: 53.80 M ops/s
-
-**Compiled-in (COMPILED=1, trace active):**
- Mean: 53.83 M ops/s
- Median: 53.70 M ops/s
-
-**Difference:**
- Mean: -0.35% (Baseline SLOWER)
- Median: +0.19% (Baseline FASTER)
- **Verdict:** **NEUTRAL** (±0.5% 範囲内)
-
-### 判定
-
-**NEUTRAL → Code Cleanliness で採用** ✅
-
-**理由:**
-1. **Performance:** Mean -0.35%, Median +0.19% → 測定ノイズ範囲（conflicting signals）
-2. **Phase 26 precedent:** -0.33% NEUTRAL → code cleanliness で採用
-3. **Phase 31 同型:** -0.35% NEUTRAL → 同じ判断基準を適用
-4. **Code cleanliness benefits:**
-   - HOT path (`hak_tiny_free()` entry) から unused TELEMETRY atomic 削除
-   - 複雑さ削減（trace macro のみ、flow control なし）
-   - Research flexibility 維持（`COMPILED=1` で復活可）
-
-**Key Finding:** Not all HOT path atomics have measurable overhead
- Phase 25 (`g_free_ss_enter`): +1.07% GO (always-increment stats)
- Phase 31 (`g_tiny_free_trace`): NEUTRAL (rate-limited to 128 calls)
- **Hypothesis:** Rate-limiting or compiler optimization may eliminate overhead
-
-### ドキュメント
-
- `docs/analysis/PHASE31_TINY_FREE_TRACE_ATOMIC_PRUNE_RESULTS.md` (完全な A/B test 結果)
- `docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md` (Phase 24-31 総括)
- `CURRENT_TASK.md` (Phase 31 完了 + Phase 32 候補)
-
-### 教訓
-
-**Phase 31 から学んだこと:**
-1. **HOT path ≠ guaranteed win:** Even high-frequency atomics may have zero overhead if optimized
-2. **NEUTRAL is valid:** Code cleanliness justifies compile-out even without performance gain (Phase 26/31 precedent)
-3. **Step 0 (execution verification) works:** Prevented Phase 29-style no-op (confirmed always active)
-4. **Standard procedure validated:** Phase 30 4-step procedure successfully guided Phase 31
-
-## 次の指示（Phase 32 実施）
-
-**Phase 31 完了:** NEUTRAL verdict、code cleanliness で採用 → Phase 32 実施へ
-
-### Phase 32 推奨候補: `g_hak_tiny_free_calls` (HOT path, TOP PRIORITY) ⭐
-
-**Location:** `core/hakmem_tiny_free.inc:335` (9 lines after Phase 31 target)
-
-**Code Context:**
-```c
-void hak_tiny_free(void* ptr) {
-#if HAKMEM_TINY_FREE_TRACE_COMPILED
-    // Phase 31 target (now compiled-out)
-#endif
-    // Track total tiny free calls (diagnostics)
-    extern _Atomic uint64_t g_hak_tiny_free_calls;
-    atomic_fetch_add_explicit(&g_hak_tiny_free_calls, 1, memory_order_relaxed);  // ← Phase 32 target
-    // ... rest of function ...
-}
-```
-
-**Classification:**
- **Class:** TELEMETRY (trace macro only)
- **Path:** HOT (every tiny free call)
- **Usage:** Only for `HAK_TRACE` debug macro output
- **ENV Gate:** None (always active)
-
-**Step 0 Verification (inherited from Phase 31):**
- ✅ No ENV gate blocking execution (same function as Phase 31)
- ✅ In `hak_tiny_free()` - called on every tiny free operation
- ✅ Mixed benchmark heavily exercises tiny free path
- ✅ Confirmed: Executes thousands of times per benchmark run (same as Phase 31)
-
-**Expected Impact:** **+0.3% to +0.7%** (smaller than Phase 25: +1.07%, similar to Phase 31: NEUTRAL)
-
-**Implementation Plan:**
-
-**Step 1: 分類（要実施）**
- ❓ Classification needed: TELEMETRY or CORRECTNESS?
- ❓ Check all usage sites with `rg -n "g_hak_tiny_free_calls" core/`
- ❓ Verify no `if` conditions using counter value
- ✅ Expected: Pure TELEMETRY (diagnostic counter)
-
-**Step 2: Compile-Out 実装**
-
-a) Add BuildFlags gate:
-```c
-// core/hakmem_build_flags.h
-// ========== Tiny Free Calls Counter Prune (Phase 32) ==========
-#ifndef HAKMEM_TINY_FREE_CALLS_COMPILED
-#  define HAKMEM_TINY_FREE_CALLS_COMPILED 0
-#endif
-```
-
-b) Wrap atomic in `core/hakmem_tiny_free.inc`:
-```c
-void hak_tiny_free(void* ptr) {
-#if HAKMEM_TINY_FREE_TRACE_COMPILED
-    // Phase 31 (already compiled-out)
-#endif
-#if HAKMEM_TINY_FREE_CALLS_COMPILED
-    extern _Atomic uint64_t g_hak_tiny_free_calls;
-    atomic_fetch_add_explicit(&g_hak_tiny_free_calls, 1, memory_order_relaxed);
-#else
-    (void)0;  // No-op when compiled out
-#endif
-    // ... rest of function ...
-}
-```
-
-**Step 3: A/B Test**
-
-Baseline (COMPILED=0):
-```bash
-make clean && make -j bench_random_mixed_hakmem
-scripts/run_mixed_10_cleanenv.sh
-```
-
-Compiled-in (COMPILED=1):
-```bash
-make clean && make -j EXTRA_CFLAGS='-DHAKMEM_TINY_FREE_CALLS_COMPILED=1' bench_random_mixed_hakmem
-scripts/run_mixed_10_cleanenv.sh
-```
-
-**Expected Result:** +0.3% to +0.7% (possible GO, or NEUTRAL like Phase 31)
-
-**Rationale:**
- Same HOT path as Phase 31 (9 lines below in same function)
- No ENV gate blocking execution (verified in Phase 31)
- Similar profile to Phase 31 (diagnostic counter)
- Moderate confidence: NEUTRAL possible (like Phase 31), but worth trying
-
-### Alternative Candidates (if Phase 32 shows NEUTRAL again)
-
-**#3: `g_p0_class_oob_log` (WARM path, error logging)**
- ❓ Execution uncertain (error path)
- Expected: ±0.0% to +0.2%
- Action: Verify execution first
-
-**#4-#N: Manual review of UNKNOWN atomics (284 candidates)**
- Many may be misclassified by naming heuristics
- Requires deeper code inspection
- Lower priority
-
-**Note:** If Phase 32 is NEUTRAL (like Phase 31), consider pausing HOT path atomic prune and moving to other optimization areas (e.g., inlining, branch optimization, SIMD opportunities).
-
-## 参考
-
- **Standard Procedure:** `docs/analysis/PHASE30_STANDARD_PROCEDURE.md`
- **Phase 31 Results:** `docs/analysis/PHASE31_TINY_FREE_TRACE_ATOMIC_PRUNE_RESULTS.md`
- **Cumulative Summary:** `docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md`
- **mimalloc Gap Analysis:** `docs/roadmap/OPTIMIZATION_ROADMAP.md`
- **Box Theory:** Phase 6-1.7+ の Box Refactor パターン
- **Phase 24-27 Pattern:** `core/box/tiny_class_stats_box.h`, `core/hakmem_build_flags.h`
- **Phase 26/31 NEUTRAL Precedent:** Code cleanliness adoption without performance win
-
-## タスク完了条件
-
-### Phase 30 完了済み条件（2025-12-16）:
-1. ✅ `PHASE30_STANDARD_PROCEDURE.md` 作成（4-step procedure）
-2. ✅ 全 atomic 監査実行（412 atomics, audit_atomics.sh）
-3. ✅ HOT/WARM path TELEMETRY 候補抽出
-4. ✅ Step 0 実行確認（全候補）
-5. ✅ `PHASE31_RECOMMENDED_CANDIDATES.md` 作成（TOP 3 prioritized）
-6. ✅ Cumulative summary 更新（Phase 24-30）
-7. ✅ CURRENT_TASK.md 更新（Phase 31 候補提示）
-
-### Phase 31 完了条件（2025-12-16）:
-1. ✅ 候補選定完了（`g_tiny_free_trace`, HOT path）
-2. ✅ Step 0 実行確認完了（ENV gate なし、実行確認済み）
-3. ✅ Step 1 分類完了（Pure TELEMETRY、CORRECTNESS なし）
-4. ✅ Step 2 実装（BuildFlags + `#if` wrap）
-5. ✅ Step 3 A/B test（Baseline vs Compiled-in）
-6. ✅ 結果ドキュメント作成（PHASE31_RESULTS.md）
-7. ✅ NEUTRAL verdict → code cleanliness で採用
-
-### Phase 32 開始前の前提条件:
-1. ✅ 候補選定完了（`g_hak_tiny_free_calls`, HOT path, same function as Phase 31）
-2. ✅ Step 0 実行確認完了（Phase 31 と同じ関数、ENV gate なし）
-3. ⏳ Step 1 分類（TELEMETRY/CORRECTNESS 判定）
-4. ⏳ Step 2 実装（BuildFlags + `#if` wrap）
-5. ⏳ Step 3 A/B test（Baseline vs Compiled-in）
-6. ⏳ 結果ドキュメント作成（PHASE32_RESULTS.md）
-
---
-
-**Last Updated:** 2025-12-16
-**Current Phase:** Phase 31 Complete (NEUTRAL -0.35%, adopted for code cleanliness)
-**Next Phase:** Phase 32 (`g_hak_tiny_free_calls`, HOT path, expected +0.3% to +0.7% or NEUTRAL)
-**Cumulative Progress:** +2.74% (18 atomics removed, 412 atomics audited)