Phase 27-28: Unified Cache stats validation + BG Spill audit

Phase 27: Unified Cache Stats A/B Test - GO (+0.74%)
- Target: g_unified_cache_* atomics (6 total) in WARM refill path
- Already implemented in Phase 23 (HAKMEM_TINY_UNIFIED_CACHE_MEASURE_COMPILED)
- A/B validation: Baseline 52.94M vs Compiled-in 52.55M ops/s
- Result: +0.74% mean, +1.01% median (both exceed +0.5% GO threshold)
- Impact: WARM path atomics have similar impact to HOT path
- Insight: Refill frequency is substantial, ENV check overhead matters

Phase 28: BG Spill Queue Atomic Audit - NO-OP
- Target: g_bg_spill_* atomics (8 total) in background spill subsystem
- Classification: 8/8 CORRECTNESS (100% untouchable)
- Key finding: g_bg_spill_len is flow control, NOT telemetry
  - Used in queue depth limiting: if (qlen < target) {...}
  - Operational counter (affects behavior), not observational
- Lesson: Counter name ≠ purpose, must trace all usages
- Result: NO-OP (no code changes, audit documentation only)

Cumulative Progress (Phase 24-28):
- Phase 24 (class stats): +0.93% GO
- Phase 25 (free stats): +1.07% GO
- Phase 26 (diagnostics): -0.33% NEUTRAL
- Phase 27 (unified cache): +0.74% GO
- Phase 28 (bg spill): NO-OP (audit only)
- Total: 17 atomics removed, +2.74% improvement

Documentation:
- PHASE27_UNIFIED_CACHE_STATS_RESULTS.md: Complete A/B test report
- PHASE28_BG_SPILL_ATOMIC_AUDIT.md: Detailed CORRECTNESS classification
- PHASE28_BG_SPILL_ATOMIC_PRUNE_RESULTS.md: NO-OP verdict and lessons
- ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md: Updated with Phase 27-28
- CURRENT_TASK.md: Phase 29 candidate identified (Pool Hotbox v2)

Generated with Claude Code
https://claude.com/claude-code

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-12-16 06:12:17 +09:00
parent 8052e8b320
commit 9ed8b9c79a
5 changed files with 952 additions and 150 deletions

View File

@ -2,13 +2,15 @@
## 現在の状態(要約)
- **安定版(本線)**: Phase 26 完了(+2.00% 累積)— Hot path atomic 監査 & compile-out 完遂
- **安定版(本線)**: Phase 28 完了(+2.74% 累積)— Hot/Warm path atomic 監査完遂(全 CORRECTNESS 判定)
- **直近の判断**:
- Phase 24OBSERVE 税 prune / tiny_class_stats: ✅ GO (+0.93%)
- Phase 25Free Stats atomic prune / g_free_ss_enter: ✅ GO (+1.07%)
- Phase 26Hot path diagnostic atomics prune / 5 atomics: ⚪ NEUTRAL (-0.33%, code cleanliness で採用)
- Phase 27Unified Cache Stats atomic prune / 6 atomics: ✅ GO (+0.74% mean, +1.01% median)
- Phase 28Background Spill Queue audit / 8 atomics: ⚪ NO-OP (全て CORRECTNESS)
- **計測の正**: `scripts/run_mixed_10_cleanenv.sh`(同一バイナリ / clean env / 10-run
- **累積効果**: **+2.00%** (Phase 24: +0.93% + Phase 25: +1.07% + Phase 26: NEUTRAL)
- **累積効果**: **+2.74%** (Phase 24: +0.93% + Phase 25: +1.07% + Phase 26: NEUTRAL + Phase 27: +0.74% + Phase 28: NO-OP)
- **目標/現状スコアカード**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md`
## 原則Box Theory 運用ルール)
@ -25,158 +27,117 @@
- **WARM path** 次点: refill/spill 経路(+0.1~0.3% 期待)
- **COLD path** 低優先: init/shutdown<0.1%, code cleanliness のみ
## Phase 26 完了2025-12-16
## Phase 28 完了2025-12-16
### 実施内容
**目的:** Hot path の全 telemetry-only atomic compile-out 固定税を完全に刈
**目的:** Background Spill Queue atomic を監査しCORRECTNESS vs TELEMETRY を分類す
**対象:** 5 つの hot path diagnostic atomics
1. **26A:** `c7_free_count` (tiny_superslab_free.inc.h:51)
2. **26B:** `g_hdr_mismatch_log` (tiny_superslab_free.inc.h:153)
3. **26C:** `g_hdr_meta_mismatch` (tiny_superslab_free.inc.h:195)
4. **26D:** `g_metric_bad_class_once` (hakmem_tiny_alloc.inc:24)
5. **26E:** `g_hdr_meta_fast` (tiny_free_fast_v2.inc.h:183)
**対象:** 8 つの background spill queue atomics
1. `atomic_load(&g_bg_spill_head)` × 2 (CAS loop)
2. `atomic_compare_exchange_weak(&g_bg_spill_head)` × 2 (lock-free queue)
3. `atomic_fetch_add(&g_bg_spill_len, 1)` (queue length increment)
4. `atomic_fetch_add(&g_bg_spill_len, count)` (queue length batch increment)
5. `atomic_load(&g_bg_spill_len)` (early-exit optimization)
6. `atomic_fetch_sub(&g_bg_spill_len)` (queue length decrement)
**実装:**
- BuildFlagsBox: `core/hakmem_build_flags.h` 5 つの compile gate 追加
- `HAKMEM_C7_FREE_COUNT_COMPILED` (default: 0)
- `HAKMEM_HDR_MISMATCH_LOG_COMPILED` (default: 0)
- `HAKMEM_HDR_META_MISMATCH_COMPILED` (default: 0)
- `HAKMEM_METRIC_BAD_CLASS_COMPILED` (default: 0)
- `HAKMEM_HDR_META_FAST_COMPILED` (default: 0)
- atomic `#if HAKMEM_*_COMPILED` でラップ
**Files:** `core/hakmem_tiny_bg_spill.h`, `core/hakmem_tiny_bg_spill.c`
### A/B テスト結果
### 監査結果
**分類:**
- **CORRECTNESS:** 8/8 (100%)
- **TELEMETRY:** 0/8 (0%)
**重要発見:** `g_bg_spill_len` telemetry ではなく **flow control** に使用される:
```c
// core/tiny_free_magazine.inc.h:76-77
uint32_t qlen = atomic_load_explicit(&g_bg_spill_len[class_idx], memory_order_relaxed);
if ((int)qlen < g_bg_spill_target) { // FLOW CONTROL DECISION
// Queue work to background spill
}
```
Baseline (compiled-out, default): 53.14 M ops/s (±0.96M)
Compiled-in (all atomics enabled): 53.31 M ops/s (±1.09M)
Difference: -0.33% (NEUTRAL, within ±0.5% noise margin)
```
**理由:**
- Lock-free queue CAS operations: CORRECTNESS同期制御
- `g_bg_spill_len`: queue depth 制限に使用unbounded growth 防止
- 削除すると動作が変わるoperational counternot observational
### 判定
**NEUTRAL** **Keep compiled-out for code cleanliness**
**NO-OP** **全て CORRECTNESS、変更なし**
**理由:**
- 実行頻度が低いエラー/診断パスのみ)→ 性能影響なし
- Benchmark variance (~2%) > 観測差分 (-0.33%)
- Code cleanliness benefit ありhot path から telemetry 除去)
- mimalloc 原則に整合hot path に observe を置かない)
- atomic correctness-criticallock-free queue or flow control
- `g_bg_spill_len` telemetry counter に見えるが実際は operational counter
- A/B test 不要compile-out 候補なし
### ドキュメント
- `docs/analysis/PHASE26_HOT_PATH_ATOMIC_AUDIT.md` (監査計画)
- `docs/analysis/PHASE26_HOT_PATH_ATOMIC_PRUNE_RESULTS.md` (完全レポート)
- `docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md` (Phase 24+25+26 総括)
- `docs/analysis/PHASE28_BG_SPILL_ATOMIC_AUDIT.md` (詳細監査)
- `docs/analysis/PHASE28_BG_SPILL_ATOMIC_PRUNE_RESULTS.md` (完全レポート)
- `docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md` (Phase 24-28 総括)
## 累積効果Phase 24+25+26
### 教訓
**分類の重要性:** Counter 名だけで判断せず使用箇所を全て確認する
- Telemetry counter: 観測専用compile-out safe
- Operational counter: flow control に使用UNTOUCHABLE
## 累積効果Phase 24+25+26+27+28
| Phase | Target | Impact | Status |
|-------|--------|--------|--------|
| **24** | `g_tiny_class_stats_*` (5 atomics) | **+0.93%** | GO |
| **25** | `g_free_ss_enter` (1 atomic) | **+1.07%** | GO |
| **26** | Hot path diagnostics (5 atomics) | **-0.33%** | NEUTRAL |
| **合計** | **11 atomics removed** | **+2.00%** | **✅** |
| **27** | `g_unified_cache_*` (6 atomics) | **+0.74%** | GO |
| **28** | Background Spill Queue (8 atomics) | **N/A** | NO-OP |
| **合計** | **17 atomics removed** | **+2.74%** | **✅** |
**Key Insight:** Atomic 実行頻度が性能影響を決める
**Key Insight:** Atomic 実行頻度が性能影響を決める分類が最重要
- High frequency (Phase 24+25): 測定可能な改善 (+0.93%, +1.07%)
- Medium frequency (Phase 27, WARM path): substantial 改善 (+0.74%)
- Low frequency (Phase 26): ニュートラルcode cleanliness のみ
- **CORRECTNESS atomics (Phase 28): 触らない**flow control, lock-free sync
## 次の指示Phase 27 候補Unified Cache Stats Atomic Prune
## 次の指示Phase 29 候補選定
**狙い:** Warm pathcache refillの telemetry atomic を compile-out し、追加の固定税削減。
**Phase 28 完了:** Background Spill Queue は全て CORRECTNESS 次の候補を選定
### 対象
### 候補 A: Remote Target Queue (WARM, MEDIUM - 要注意)
- **Atomics:** `g_remote_target_len[class_idx]` (fetch_add/sub)
- **File:** `core/hakmem_tiny_remote_target.c`
- **Frequency:** Warm (remote free path)
- **Expected:** +0.1~0.3% (telemetry の場合)
- **⚠ Warning:** `g_bg_spill_len` と同様flow control の可能性あり要監査
**Unified Cache Stats** (warm path, multiple atomics):
- `g_unified_cache_hits_global`
- `g_unified_cache_misses_global`
- `g_unified_cache_refill_cycles_global`
- `g_unified_cache_*_by_class[class_idx]`
### 候補 B: Pool Hotbox v2 Stats (WARM-HOT, HIGH)
- **Atomics:** `g_pool_hotbox_v2_stats[ci].*` (~15 counters)
- **File:** `core/hakmem_pool.c`
- **Frequency:** Medium-High (pool alloc/free operations)
- **Expected:** +0.2~0.5% (high-frequency なら)
- **Note:** 完全に stats 専用なら高優先度
**File:** `core/front/tiny_unified_cache.c` (multiple locations)
**Frequency:** Warm (cache refill path, 中頻度)
**Expected Gain:** +0.2~0.4%
### 方針(箱の境界)
- BuildFlagsBox: `core/hakmem_build_flags.h`
- `HAKMEM_UNIFIED_CACHE_STATS_COMPILED=0/1`default: 0を追加
- 0 のとき:
- 全ての unified cache stats atomics を compile-out
- API/構造は維持(既存の箱を汚さない)
### A/Bbuild-level
1) **baselinedefault compile-out**
```bash
make clean && make -j bench_random_mixed_hakmem
scripts/run_mixed_10_cleanenv.sh > phase27_baseline.txt
```
2) **compiled-in研究用**
```bash
make clean && make -j EXTRA_CFLAGS='-DHAKMEM_UNIFIED_CACHE_STATS_COMPILED=1' bench_random_mixed_hakmem
scripts/run_mixed_10_cleanenv.sh > phase27_compiled_in.txt
```
### 判定(保守運用)
- **GO:** +0.5% 以上 → 本線採用compiled-out を default に)
- **NEUTRAL:** ±0.5% → code cleanliness で採用compiled-out を default に)
- **NO-GO:** -0.5% 以下 → revertcompiled-in を default に戻す)
### 実装パターンPhase 24+25+26 と同様)
```c
// core/hakmem_build_flags.h
#ifndef HAKMEM_UNIFIED_CACHE_STATS_COMPILED
# define HAKMEM_UNIFIED_CACHE_STATS_COMPILED 0
#endif
// core/front/tiny_unified_cache.c (各箇所)
#if HAKMEM_UNIFIED_CACHE_STATS_COMPILED
atomic_fetch_add_explicit(&g_unified_cache_hits_global, 1, memory_order_relaxed);
atomic_fetch_add_explicit(&g_unified_cache_hits_by_class[class_idx], 1, memory_order_relaxed);
#else
(void)0; // No-op when compiled out
#endif
```
### ドキュメント要件
実装後、以下を作成:
- `docs/analysis/PHASE27_UNIFIED_CACHE_STATS_RESULTS.md`
- Implementation details
- A/B test results (10-run baseline vs compiled-in)
- Verdict & reasoning
- Files modified
- `docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md` を更新
- Phase 27 追加
- 累積効果更新
## 今後の Phase 候補(優先順位順)
### Phase 27: Unified Cache Stats (WARM, HIGH PRIORITY)
- **Expected:** +0.2~0.4%
- **File:** `core/front/tiny_unified_cache.c`
- **Atomics:** `g_unified_cache_*` (複数)
### Phase 28: Background Spill Queue (WARM, MEDIUM - 要分類)
- **Expected:** +0.1~0.2% (telemetry の場合)
- **File:** `core/hakmem_tiny_bg_spill.h`
- **Atomics:** `g_bg_spill_len`
- **Note:** Correctness 確認が必要queue length が flow control に使われている可能性)
### Phase 29+: Cold Path Stats (COLD, LOW PRIORITY)
### 候補 C: Cold Path Stats (COLD, LOW PRIORITY)
- **Expected:** <0.1% (code cleanliness のみ)
- **Targets:**
- SS allocation stats (`g_ss_os_alloc_calls`, etc.)
- Shared pool diagnostics (`rel_c7_*`, `dbg_c7_*`)
- Debug trace logs (`g_hak_alloc_at_trace`, etc.)
### 推奨: 候補 B (Pool Hotbox v2 Stats)
**理由:**
- Stats 専用の可能性が高いflow control の懸念が低い
- Pool operations は頻度が高い+0.2~0.5% 期待
- Phase 24-27 の成功パターンと同類high-frequency telemetry
**次のアクション:**
1. `g_pool_hotbox_v2_stats` 全使用箇所を grep
2. CORRECTNESS vs TELEMETRY 分類
3. TELEMETRY なら Phase 24-27 パターンで実装 & A/B test
## 参考
- **mimalloc Gap Analysis:** `docs/roadmap/OPTIMIZATION_ROADMAP.md`
@ -187,16 +148,23 @@ scripts/run_mixed_10_cleanenv.sh > phase27_compiled_in.txt
## タスク完了条件
Phase 27 完了時:
1. `HAKMEM_UNIFIED_CACHE_STATS_COMPILED` flag 追加
2. unified cache stats atomics をラップ
3. A/B test 実施10-run baseline vs compiled-in
4. Verdict 判定GO / NEUTRAL / NO-GO
5. `PHASE27_*_RESULTS.md` 作成
6. Cumulative summary 更新
### Phase 28 完了済み条件2025-12-16:
1. Background spill queue atomic の監査完了8 atomics
2. CORRECTNESS vs TELEMETRY 分類完了8/8 CORRECTNESS
3. `g_bg_spill_len` flow control 使用確認
4. NO-OP 判定compile-out 候補なし
5. `PHASE28_BG_SPILL_ATOMIC_AUDIT.md` 作成
6. `PHASE28_BG_SPILL_ATOMIC_PRUNE_RESULTS.md` 作成
7. Cumulative summary 更新Phase 24-28
### Phase 29 開始前の前提条件:
1. 候補選定Pool Hotbox v2 Stats 推奨
2. atomic CORRECTNESS vs TELEMETRY 分類
3. TELEMETRY の場合: Phase 24-27 パターンで実装 & A/B テスト
4. CORRECTNESS の場合: Phase 29 skipPhase 30+ 候補選定
---
**Last Updated:** 2025-12-16
**Current Phase:** Phase 26 Complete (+2.00% cumulative)
**Next Phase:** Phase 27 (Unified Cache Stats, warm path)
**Current Phase:** Phase 28 Complete (+2.74% cumulative, 17 atomics removed)
**Next Phase:** Phase 29 (候補: Pool Hotbox v2 Stats or Remote Target Queue)