Phase 27-28: Unified Cache stats validation + BG Spill audit

Phase 27: Unified Cache Stats A/B Test - GO (+0.74%) - Target: g_unified_cache_* atomics (6 total) in WARM refill path - Already implemented in Phase 23 (HAKMEM_TINY_UNIFIED_CACHE_MEASURE_COMPILED) - A/B validation: Baseline 52.94M vs Compiled-in 52.55M ops/s - Result: +0.74% mean, +1.01% median (both exceed +0.5% GO threshold) - Impact: WARM path atomics have similar impact to HOT path - Insight: Refill frequency is substantial, ENV check overhead matters Phase 28: BG Spill Queue Atomic Audit - NO-OP - Target: g_bg_spill_* atomics (8 total) in background spill subsystem - Classification: 8/8 CORRECTNESS (100% untouchable) - Key finding: g_bg_spill_len is flow control, NOT telemetry - Used in queue depth limiting: if (qlen < target) {...} - Operational counter (affects behavior), not observational - Lesson: Counter name ≠ purpose, must trace all usages - Result: NO-OP (no code changes, audit documentation only) Cumulative Progress (Phase 24-28): - Phase 24 (class stats): +0.93% GO - Phase 25 (free stats): +1.07% GO - Phase 26 (diagnostics): -0.33% NEUTRAL - Phase 27 (unified cache): +0.74% GO - Phase 28 (bg spill): NO-OP (audit only) - Total: 17 atomics removed, +2.74% improvement Documentation: - PHASE27_UNIFIED_CACHE_STATS_RESULTS.md: Complete A/B test report - PHASE28_BG_SPILL_ATOMIC_AUDIT.md: Detailed CORRECTNESS classification - PHASE28_BG_SPILL_ATOMIC_PRUNE_RESULTS.md: NO-OP verdict and lessons - ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md: Updated with Phase 27-28 - CURRENT_TASK.md: Phase 29 candidate identified (Pool Hotbox v2) Generated with Claude Code https://claude.com/claude-code Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-16 06:12:17 +09:00
parent 8052e8b320
commit 9ed8b9c79a
5 changed files with 952 additions and 150 deletions
--- a/CURRENT_TASK.md
+++ b/CURRENT_TASK.md
@ -2,13 +2,15 @@

 ## 現在の状態（要約）

- **安定版（本線）**: Phase 26 完了（+2.00% 累積）— Hot path atomic 監査 & compile-out 完遂
+- **安定版（本線）**: Phase 28 完了（+2.74% 累積）— Hot/Warm path atomic 監査完遂（全 CORRECTNESS 判定）
 - **直近の判断**:
  - Phase 24（OBSERVE 税 prune / tiny_class_stats）: ✅ GO (+0.93%)
  - Phase 25（Free Stats atomic prune / g_free_ss_enter）: ✅ GO (+1.07%)
  - Phase 26（Hot path diagnostic atomics prune / 5 atomics）: ⚪ NEUTRAL (-0.33%, code cleanliness で採用)
+  - Phase 27（Unified Cache Stats atomic prune / 6 atomics）: ✅ GO (+0.74% mean, +1.01% median)
+  - Phase 28（Background Spill Queue audit / 8 atomics）: ⚪ NO-OP (全て CORRECTNESS)
 - **計測の正**: `scripts/run_mixed_10_cleanenv.sh`（同一バイナリ / clean env / 10-run）
- **累積効果**: **+2.00%** (Phase 24: +0.93% + Phase 25: +1.07% + Phase 26: NEUTRAL)
+- **累積効果**: **+2.74%** (Phase 24: +0.93% + Phase 25: +1.07% + Phase 26: NEUTRAL + Phase 27: +0.74% + Phase 28: NO-OP)
 - **目標/現状スコアカード**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md`

 ## 原則（Box Theory 運用ルール）
@ -25,158 +27,117 @@
  - **WARM path** 次点: refill/spill 経路（+0.1~0.3% 期待）
  - **COLD path** 低優先: init/shutdown（<0.1%, code cleanliness のみ）

-## Phase 26 完了（2025-12-16）
+## Phase 28 完了（2025-12-16）

 ### 実施内容

-**目的:** Hot path の全 telemetry-only atomic を compile-out し、固定税を完全に刈る。
+**目的:** Background Spill Queue の atomic を監査し、CORRECTNESS vs TELEMETRY を分類する。

-**対象:** 5 つの hot path diagnostic atomics
-1. **26A:** `c7_free_count` (tiny_superslab_free.inc.h:51)
-2. **26B:** `g_hdr_mismatch_log` (tiny_superslab_free.inc.h:153)
-3. **26C:** `g_hdr_meta_mismatch` (tiny_superslab_free.inc.h:195)
-4. **26D:** `g_metric_bad_class_once` (hakmem_tiny_alloc.inc:24)
-5. **26E:** `g_hdr_meta_fast` (tiny_free_fast_v2.inc.h:183)
+**対象:** 8 つの background spill queue atomics
+1. `atomic_load(&g_bg_spill_head)` × 2 (CAS loop)
+2. `atomic_compare_exchange_weak(&g_bg_spill_head)` × 2 (lock-free queue)
+3. `atomic_fetch_add(&g_bg_spill_len, 1)` (queue length increment)
+4. `atomic_fetch_add(&g_bg_spill_len, count)` (queue length batch increment)
+5. `atomic_load(&g_bg_spill_len)` (early-exit optimization)
+6. `atomic_fetch_sub(&g_bg_spill_len)` (queue length decrement)

-**実装:**
- BuildFlagsBox: `core/hakmem_build_flags.h` に 5 つの compile gate 追加
-  - `HAKMEM_C7_FREE_COUNT_COMPILED` (default: 0)
-  - `HAKMEM_HDR_MISMATCH_LOG_COMPILED` (default: 0)
-  - `HAKMEM_HDR_META_MISMATCH_COMPILED` (default: 0)
-  - `HAKMEM_METRIC_BAD_CLASS_COMPILED` (default: 0)
-  - `HAKMEM_HDR_META_FAST_COMPILED` (default: 0)
- 各 atomic を `#if HAKMEM_*_COMPILED` でラップ
+**Files:** `core/hakmem_tiny_bg_spill.h`, `core/hakmem_tiny_bg_spill.c`

-### A/B テスト結果
+### 監査結果

+**分類:**
+- **CORRECTNESS:** 8/8 (100%)
+- **TELEMETRY:** 0/8 (0%)
+
+**重要発見:** `g_bg_spill_len` は telemetry ではなく **flow control** に使用される:
+```c
+// core/tiny_free_magazine.inc.h:76-77
+uint32_t qlen = atomic_load_explicit(&g_bg_spill_len[class_idx], memory_order_relaxed);
+if ((int)qlen < g_bg_spill_target) {  // FLOW CONTROL DECISION
+    // Queue work to background spill
+}
 ```
-Baseline (compiled-out, default):  53.14 M ops/s (±0.96M)
-Compiled-in (all atomics enabled): 53.31 M ops/s (±1.09M)
-Difference: -0.33% (NEUTRAL, within ±0.5% noise margin)
-```
+
+**理由:**
+- Lock-free queue の CAS operations: CORRECTNESS（同期制御）
+- `g_bg_spill_len`: queue depth 制限に使用（unbounded growth 防止）
+- 削除すると動作が変わる（operational counter、not observational）

 ### 判定

-**NEUTRAL** ➡️ **Keep compiled-out for code cleanliness** ✅
+**NO-OP** ➡️ **全て CORRECTNESS、変更なし** ✅

 **理由:**
- 実行頻度が低い（エラー/診断パスのみ）→ 性能影響なし
- Benchmark variance (~2%) > 観測差分 (-0.33%)
- Code cleanliness benefit あり（hot path から telemetry 除去）
- mimalloc 原則に整合（hot path に observe を置かない）
+- 全 atomic が correctness-critical（lock-free queue or flow control）
+- `g_bg_spill_len` は telemetry counter に見えるが、実際は operational counter
+- A/B test 不要（compile-out 候補なし）

 ### ドキュメント

- `docs/analysis/PHASE26_HOT_PATH_ATOMIC_AUDIT.md` (監査計画)
- `docs/analysis/PHASE26_HOT_PATH_ATOMIC_PRUNE_RESULTS.md` (完全レポート)
- `docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md` (Phase 24+25+26 総括)
+- `docs/analysis/PHASE28_BG_SPILL_ATOMIC_AUDIT.md` (詳細監査)
+- `docs/analysis/PHASE28_BG_SPILL_ATOMIC_PRUNE_RESULTS.md` (完全レポート)
+- `docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md` (Phase 24-28 総括)

-## 累積効果（Phase 24+25+26）
+### 教訓
+
+**分類の重要性:** Counter 名だけで判断せず、使用箇所を全て確認する。
+- Telemetry counter: 観測専用（compile-out safe）
+- Operational counter: flow control に使用（UNTOUCHABLE）
+
+## 累積効果（Phase 24+25+26+27+28）

 | Phase | Target | Impact | Status |
 |-------|--------|--------|--------|
 | **24** | `g_tiny_class_stats_*` (5 atomics) | **+0.93%** | GO ✅ |
 | **25** | `g_free_ss_enter` (1 atomic) | **+1.07%** | GO ✅ |
 | **26** | Hot path diagnostics (5 atomics) | **-0.33%** | NEUTRAL ✅ |
-| **合計** | **11 atomics removed** | **+2.00%** | **✅** |
+| **27** | `g_unified_cache_*` (6 atomics) | **+0.74%** | GO ✅ |
+| **28** | Background Spill Queue (8 atomics) | **N/A** | NO-OP ✅ |
+| **合計** | **17 atomics removed** | **+2.74%** | **✅** |

-**Key Insight:** Atomic 実行頻度が性能影響を決める。
+**Key Insight:** Atomic 実行頻度が性能影響を決める。分類が最重要。
 - High frequency (Phase 24+25): 測定可能な改善 (+0.93%, +1.07%)
+- Medium frequency (Phase 27, WARM path): substantial 改善 (+0.74%)
 - Low frequency (Phase 26): ニュートラル（code cleanliness のみ）
+- **CORRECTNESS atomics (Phase 28): 触らない**（flow control, lock-free sync）

-## 次の指示（Phase 27 候補：Unified Cache Stats Atomic Prune）
+## 次の指示（Phase 29 候補選定）

-**狙い:** Warm path（cache refill）の telemetry atomic を compile-out し、追加の固定税削減。
+**Phase 28 完了:** Background Spill Queue は全て CORRECTNESS → 次の候補を選定

-### 対象
+### 候補 A: Remote Target Queue (WARM, MEDIUM - 要注意)
+- **Atomics:** `g_remote_target_len[class_idx]` (fetch_add/sub)
+- **File:** `core/hakmem_tiny_remote_target.c`
+- **Frequency:** Warm (remote free path)
+- **Expected:** +0.1~0.3% (telemetry の場合)
+- **⚠️ Warning:** `g_bg_spill_len` と同様、flow control の可能性あり（要監査）

-**Unified Cache Stats** (warm path, multiple atomics):
- `g_unified_cache_hits_global`
- `g_unified_cache_misses_global`
- `g_unified_cache_refill_cycles_global`
- `g_unified_cache_*_by_class[class_idx]`
+### 候補 B: Pool Hotbox v2 Stats (WARM-HOT, HIGH)
+- **Atomics:** `g_pool_hotbox_v2_stats[ci].*` (~15 counters)
+- **File:** `core/hakmem_pool.c`
+- **Frequency:** Medium-High (pool alloc/free operations)
+- **Expected:** +0.2~0.5% (high-frequency なら)
+- **Note:** 完全に stats 専用なら高優先度

-**File:** `core/front/tiny_unified_cache.c` (multiple locations)
-**Frequency:** Warm (cache refill path, 中頻度)
-**Expected Gain:** +0.2~0.4%
-
-### 方針（箱の境界）
-
- BuildFlagsBox: `core/hakmem_build_flags.h`
-  - `HAKMEM_UNIFIED_CACHE_STATS_COMPILED=0/1`（default: 0）を追加
- 0 のとき:
-  - 全ての unified cache stats atomics を compile-out
-  - API/構造は維持（既存の箱を汚さない）
-
-### A/B（build-level）
-
-1) **baseline（default compile-out）**
-```bash
-make clean && make -j bench_random_mixed_hakmem
-scripts/run_mixed_10_cleanenv.sh > phase27_baseline.txt
-```
-
-2) **compiled-in（研究用）**
-```bash
-make clean && make -j EXTRA_CFLAGS='-DHAKMEM_UNIFIED_CACHE_STATS_COMPILED=1' bench_random_mixed_hakmem
-scripts/run_mixed_10_cleanenv.sh > phase27_compiled_in.txt
-```
-
-### 判定（保守運用）
-
- **GO:** +0.5% 以上 → 本線採用（compiled-out を default に）
- **NEUTRAL:** ±0.5% → code cleanliness で採用（compiled-out を default に）
- **NO-GO:** -0.5% 以下 → revert（compiled-in を default に戻す）
-
-### 実装パターン（Phase 24+25+26 と同様）
-
-```c
-// core/hakmem_build_flags.h
-#ifndef HAKMEM_UNIFIED_CACHE_STATS_COMPILED
-#  define HAKMEM_UNIFIED_CACHE_STATS_COMPILED 0
-#endif
-
-// core/front/tiny_unified_cache.c (各箇所)
-#if HAKMEM_UNIFIED_CACHE_STATS_COMPILED
-    atomic_fetch_add_explicit(&g_unified_cache_hits_global, 1, memory_order_relaxed);
-    atomic_fetch_add_explicit(&g_unified_cache_hits_by_class[class_idx], 1, memory_order_relaxed);
-#else
-    (void)0;  // No-op when compiled out
-#endif
-```
-
-### ドキュメント要件
-
-実装後、以下を作成:
- `docs/analysis/PHASE27_UNIFIED_CACHE_STATS_RESULTS.md`
-  - Implementation details
-  - A/B test results (10-run baseline vs compiled-in)
-  - Verdict & reasoning
-  - Files modified
- `docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md` を更新
-  - Phase 27 追加
-  - 累積効果更新
-
-## 今後の Phase 候補（優先順位順）
-
-### Phase 27: Unified Cache Stats (WARM, HIGH PRIORITY)
- **Expected:** +0.2~0.4%
- **File:** `core/front/tiny_unified_cache.c`
- **Atomics:** `g_unified_cache_*` (複数)
-
-### Phase 28: Background Spill Queue (WARM, MEDIUM - 要分類)
- **Expected:** +0.1~0.2% (telemetry の場合)
- **File:** `core/hakmem_tiny_bg_spill.h`
- **Atomics:** `g_bg_spill_len`
- **Note:** Correctness 確認が必要（queue length が flow control に使われている可能性）
-
-### Phase 29+: Cold Path Stats (COLD, LOW PRIORITY)
+### 候補 C: Cold Path Stats (COLD, LOW PRIORITY)
 - **Expected:** <0.1% (code cleanliness のみ)
 - **Targets:**
  - SS allocation stats (`g_ss_os_alloc_calls`, etc.)
  - Shared pool diagnostics (`rel_c7_*`, `dbg_c7_*`)
  - Debug trace logs (`g_hak_alloc_at_trace`, etc.)

+### 推奨: 候補 B (Pool Hotbox v2 Stats)
+
+**理由:**
+- Stats 専用の可能性が高い（flow control の懸念が低い）
+- Pool operations は頻度が高い（+0.2~0.5% 期待）
+- Phase 24-27 の成功パターンと同類（high-frequency telemetry）
+
+**次のアクション:**
+1. `g_pool_hotbox_v2_stats` 全使用箇所を grep
+2. CORRECTNESS vs TELEMETRY 分類
+3. TELEMETRY なら Phase 24-27 パターンで実装 & A/B test
+
 ## 参考

 - **mimalloc Gap Analysis:** `docs/roadmap/OPTIMIZATION_ROADMAP.md`
@ -187,16 +148,23 @@ scripts/run_mixed_10_cleanenv.sh > phase27_compiled_in.txt

 ## タスク完了条件

-Phase 27 完了時:
-1. ✅ `HAKMEM_UNIFIED_CACHE_STATS_COMPILED` flag 追加
-2. ✅ 全 unified cache stats atomics をラップ
-3. ✅ A/B test 実施（10-run baseline vs compiled-in）
-4. ✅ Verdict 判定（GO / NEUTRAL / NO-GO）
-5. ✅ `PHASE27_*_RESULTS.md` 作成
-6. ✅ Cumulative summary 更新
+### Phase 28 完了済み条件（2025-12-16）:
+1. ✅ Background spill queue 全 atomic の監査完了（8 atomics）
+2. ✅ CORRECTNESS vs TELEMETRY 分類完了（8/8 CORRECTNESS）
+3. ✅ `g_bg_spill_len` の flow control 使用確認
+4. ✅ NO-OP 判定（compile-out 候補なし）
+5. ✅ `PHASE28_BG_SPILL_ATOMIC_AUDIT.md` 作成
+6. ✅ `PHASE28_BG_SPILL_ATOMIC_PRUNE_RESULTS.md` 作成
+7. ✅ Cumulative summary 更新（Phase 24-28）
+
+### Phase 29 開始前の前提条件:
+1. ⏳ 候補選定（Pool Hotbox v2 Stats 推奨）
+2. ⏳ 全 atomic の CORRECTNESS vs TELEMETRY 分類
+3. ⏳ TELEMETRY の場合: Phase 24-27 パターンで実装 & A/B テスト
+4. ⏳ CORRECTNESS の場合: Phase 29 skip、Phase 30+ 候補選定

 ---

 **Last Updated:** 2025-12-16
-**Current Phase:** Phase 26 Complete (+2.00% cumulative)
-**Next Phase:** Phase 27 (Unified Cache Stats, warm path)
+**Current Phase:** Phase 28 Complete (+2.74% cumulative, 17 atomics removed)
+**Next Phase:** Phase 29 (候補: Pool Hotbox v2 Stats or Remote Target Queue)