Phase 27-28: Unified Cache stats validation + BG Spill audit
Phase 27: Unified Cache Stats A/B Test - GO (+0.74%)
- Target: g_unified_cache_* atomics (6 total) in WARM refill path
- Already implemented in Phase 23 (HAKMEM_TINY_UNIFIED_CACHE_MEASURE_COMPILED)
- A/B validation: Baseline 52.94M vs Compiled-in 52.55M ops/s
- Result: +0.74% mean, +1.01% median (both exceed +0.5% GO threshold)
- Impact: WARM path atomics have similar impact to HOT path
- Insight: Refill frequency is substantial, ENV check overhead matters
Phase 28: BG Spill Queue Atomic Audit - NO-OP
- Target: g_bg_spill_* atomics (8 total) in background spill subsystem
- Classification: 8/8 CORRECTNESS (100% untouchable)
- Key finding: g_bg_spill_len is flow control, NOT telemetry
- Used in queue depth limiting: if (qlen < target) {...}
- Operational counter (affects behavior), not observational
- Lesson: Counter name ≠ purpose, must trace all usages
- Result: NO-OP (no code changes, audit documentation only)
Cumulative Progress (Phase 24-28):
- Phase 24 (class stats): +0.93% GO
- Phase 25 (free stats): +1.07% GO
- Phase 26 (diagnostics): -0.33% NEUTRAL
- Phase 27 (unified cache): +0.74% GO
- Phase 28 (bg spill): NO-OP (audit only)
- Total: 17 atomics removed, +2.74% improvement
Documentation:
- PHASE27_UNIFIED_CACHE_STATS_RESULTS.md: Complete A/B test report
- PHASE28_BG_SPILL_ATOMIC_AUDIT.md: Detailed CORRECTNESS classification
- PHASE28_BG_SPILL_ATOMIC_PRUNE_RESULTS.md: NO-OP verdict and lessons
- ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md: Updated with Phase 27-28
- CURRENT_TASK.md: Phase 29 candidate identified (Pool Hotbox v2)
Generated with Claude Code
https://claude.com/claude-code
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
220
CURRENT_TASK.md
220
CURRENT_TASK.md
@ -2,13 +2,15 @@
|
||||
|
||||
## 現在の状態(要約)
|
||||
|
||||
- **安定版(本線)**: Phase 26 完了(+2.00% 累積)— Hot path atomic 監査 & compile-out 完遂
|
||||
- **安定版(本線)**: Phase 28 完了(+2.74% 累積)— Hot/Warm path atomic 監査完遂(全 CORRECTNESS 判定)
|
||||
- **直近の判断**:
|
||||
- Phase 24(OBSERVE 税 prune / tiny_class_stats): ✅ GO (+0.93%)
|
||||
- Phase 25(Free Stats atomic prune / g_free_ss_enter): ✅ GO (+1.07%)
|
||||
- Phase 26(Hot path diagnostic atomics prune / 5 atomics): ⚪ NEUTRAL (-0.33%, code cleanliness で採用)
|
||||
- Phase 27(Unified Cache Stats atomic prune / 6 atomics): ✅ GO (+0.74% mean, +1.01% median)
|
||||
- Phase 28(Background Spill Queue audit / 8 atomics): ⚪ NO-OP (全て CORRECTNESS)
|
||||
- **計測の正**: `scripts/run_mixed_10_cleanenv.sh`(同一バイナリ / clean env / 10-run)
|
||||
- **累積効果**: **+2.00%** (Phase 24: +0.93% + Phase 25: +1.07% + Phase 26: NEUTRAL)
|
||||
- **累積効果**: **+2.74%** (Phase 24: +0.93% + Phase 25: +1.07% + Phase 26: NEUTRAL + Phase 27: +0.74% + Phase 28: NO-OP)
|
||||
- **目標/現状スコアカード**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md`
|
||||
|
||||
## 原則(Box Theory 運用ルール)
|
||||
@ -25,158 +27,117 @@
|
||||
- **WARM path** 次点: refill/spill 経路(+0.1~0.3% 期待)
|
||||
- **COLD path** 低優先: init/shutdown(<0.1%, code cleanliness のみ)
|
||||
|
||||
## Phase 26 完了(2025-12-16)
|
||||
## Phase 28 完了(2025-12-16)
|
||||
|
||||
### 実施内容
|
||||
|
||||
**目的:** Hot path の全 telemetry-only atomic を compile-out し、固定税を完全に刈る。
|
||||
**目的:** Background Spill Queue の atomic を監査し、CORRECTNESS vs TELEMETRY を分類する。
|
||||
|
||||
**対象:** 5 つの hot path diagnostic atomics
|
||||
1. **26A:** `c7_free_count` (tiny_superslab_free.inc.h:51)
|
||||
2. **26B:** `g_hdr_mismatch_log` (tiny_superslab_free.inc.h:153)
|
||||
3. **26C:** `g_hdr_meta_mismatch` (tiny_superslab_free.inc.h:195)
|
||||
4. **26D:** `g_metric_bad_class_once` (hakmem_tiny_alloc.inc:24)
|
||||
5. **26E:** `g_hdr_meta_fast` (tiny_free_fast_v2.inc.h:183)
|
||||
**対象:** 8 つの background spill queue atomics
|
||||
1. `atomic_load(&g_bg_spill_head)` × 2 (CAS loop)
|
||||
2. `atomic_compare_exchange_weak(&g_bg_spill_head)` × 2 (lock-free queue)
|
||||
3. `atomic_fetch_add(&g_bg_spill_len, 1)` (queue length increment)
|
||||
4. `atomic_fetch_add(&g_bg_spill_len, count)` (queue length batch increment)
|
||||
5. `atomic_load(&g_bg_spill_len)` (early-exit optimization)
|
||||
6. `atomic_fetch_sub(&g_bg_spill_len)` (queue length decrement)
|
||||
|
||||
**実装:**
|
||||
- BuildFlagsBox: `core/hakmem_build_flags.h` に 5 つの compile gate 追加
|
||||
- `HAKMEM_C7_FREE_COUNT_COMPILED` (default: 0)
|
||||
- `HAKMEM_HDR_MISMATCH_LOG_COMPILED` (default: 0)
|
||||
- `HAKMEM_HDR_META_MISMATCH_COMPILED` (default: 0)
|
||||
- `HAKMEM_METRIC_BAD_CLASS_COMPILED` (default: 0)
|
||||
- `HAKMEM_HDR_META_FAST_COMPILED` (default: 0)
|
||||
- 各 atomic を `#if HAKMEM_*_COMPILED` でラップ
|
||||
**Files:** `core/hakmem_tiny_bg_spill.h`, `core/hakmem_tiny_bg_spill.c`
|
||||
|
||||
### A/B テスト結果
|
||||
### 監査結果
|
||||
|
||||
**分類:**
|
||||
- **CORRECTNESS:** 8/8 (100%)
|
||||
- **TELEMETRY:** 0/8 (0%)
|
||||
|
||||
**重要発見:** `g_bg_spill_len` は telemetry ではなく **flow control** に使用される:
|
||||
```c
|
||||
// core/tiny_free_magazine.inc.h:76-77
|
||||
uint32_t qlen = atomic_load_explicit(&g_bg_spill_len[class_idx], memory_order_relaxed);
|
||||
if ((int)qlen < g_bg_spill_target) { // FLOW CONTROL DECISION
|
||||
// Queue work to background spill
|
||||
}
|
||||
```
|
||||
Baseline (compiled-out, default): 53.14 M ops/s (±0.96M)
|
||||
Compiled-in (all atomics enabled): 53.31 M ops/s (±1.09M)
|
||||
Difference: -0.33% (NEUTRAL, within ±0.5% noise margin)
|
||||
```
|
||||
|
||||
**理由:**
|
||||
- Lock-free queue の CAS operations: CORRECTNESS(同期制御)
|
||||
- `g_bg_spill_len`: queue depth 制限に使用(unbounded growth 防止)
|
||||
- 削除すると動作が変わる(operational counter、not observational)
|
||||
|
||||
### 判定
|
||||
|
||||
**NEUTRAL** ➡️ **Keep compiled-out for code cleanliness** ✅
|
||||
**NO-OP** ➡️ **全て CORRECTNESS、変更なし** ✅
|
||||
|
||||
**理由:**
|
||||
- 実行頻度が低い(エラー/診断パスのみ)→ 性能影響なし
|
||||
- Benchmark variance (~2%) > 観測差分 (-0.33%)
|
||||
- Code cleanliness benefit あり(hot path から telemetry 除去)
|
||||
- mimalloc 原則に整合(hot path に observe を置かない)
|
||||
- 全 atomic が correctness-critical(lock-free queue or flow control)
|
||||
- `g_bg_spill_len` は telemetry counter に見えるが、実際は operational counter
|
||||
- A/B test 不要(compile-out 候補なし)
|
||||
|
||||
### ドキュメント
|
||||
|
||||
- `docs/analysis/PHASE26_HOT_PATH_ATOMIC_AUDIT.md` (監査計画)
|
||||
- `docs/analysis/PHASE26_HOT_PATH_ATOMIC_PRUNE_RESULTS.md` (完全レポート)
|
||||
- `docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md` (Phase 24+25+26 総括)
|
||||
- `docs/analysis/PHASE28_BG_SPILL_ATOMIC_AUDIT.md` (詳細監査)
|
||||
- `docs/analysis/PHASE28_BG_SPILL_ATOMIC_PRUNE_RESULTS.md` (完全レポート)
|
||||
- `docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md` (Phase 24-28 総括)
|
||||
|
||||
## 累積効果(Phase 24+25+26)
|
||||
### 教訓
|
||||
|
||||
**分類の重要性:** Counter 名だけで判断せず、使用箇所を全て確認する。
|
||||
- Telemetry counter: 観測専用(compile-out safe)
|
||||
- Operational counter: flow control に使用(UNTOUCHABLE)
|
||||
|
||||
## 累積効果(Phase 24+25+26+27+28)
|
||||
|
||||
| Phase | Target | Impact | Status |
|
||||
|-------|--------|--------|--------|
|
||||
| **24** | `g_tiny_class_stats_*` (5 atomics) | **+0.93%** | GO ✅ |
|
||||
| **25** | `g_free_ss_enter` (1 atomic) | **+1.07%** | GO ✅ |
|
||||
| **26** | Hot path diagnostics (5 atomics) | **-0.33%** | NEUTRAL ✅ |
|
||||
| **合計** | **11 atomics removed** | **+2.00%** | **✅** |
|
||||
| **27** | `g_unified_cache_*` (6 atomics) | **+0.74%** | GO ✅ |
|
||||
| **28** | Background Spill Queue (8 atomics) | **N/A** | NO-OP ✅ |
|
||||
| **合計** | **17 atomics removed** | **+2.74%** | **✅** |
|
||||
|
||||
**Key Insight:** Atomic 実行頻度が性能影響を決める。
|
||||
**Key Insight:** Atomic 実行頻度が性能影響を決める。分類が最重要。
|
||||
- High frequency (Phase 24+25): 測定可能な改善 (+0.93%, +1.07%)
|
||||
- Medium frequency (Phase 27, WARM path): substantial 改善 (+0.74%)
|
||||
- Low frequency (Phase 26): ニュートラル(code cleanliness のみ)
|
||||
- **CORRECTNESS atomics (Phase 28): 触らない**(flow control, lock-free sync)
|
||||
|
||||
## 次の指示(Phase 27 候補:Unified Cache Stats Atomic Prune)
|
||||
## 次の指示(Phase 29 候補選定)
|
||||
|
||||
**狙い:** Warm path(cache refill)の telemetry atomic を compile-out し、追加の固定税削減。
|
||||
**Phase 28 完了:** Background Spill Queue は全て CORRECTNESS → 次の候補を選定
|
||||
|
||||
### 対象
|
||||
### 候補 A: Remote Target Queue (WARM, MEDIUM - 要注意)
|
||||
- **Atomics:** `g_remote_target_len[class_idx]` (fetch_add/sub)
|
||||
- **File:** `core/hakmem_tiny_remote_target.c`
|
||||
- **Frequency:** Warm (remote free path)
|
||||
- **Expected:** +0.1~0.3% (telemetry の場合)
|
||||
- **⚠️ Warning:** `g_bg_spill_len` と同様、flow control の可能性あり(要監査)
|
||||
|
||||
**Unified Cache Stats** (warm path, multiple atomics):
|
||||
- `g_unified_cache_hits_global`
|
||||
- `g_unified_cache_misses_global`
|
||||
- `g_unified_cache_refill_cycles_global`
|
||||
- `g_unified_cache_*_by_class[class_idx]`
|
||||
### 候補 B: Pool Hotbox v2 Stats (WARM-HOT, HIGH)
|
||||
- **Atomics:** `g_pool_hotbox_v2_stats[ci].*` (~15 counters)
|
||||
- **File:** `core/hakmem_pool.c`
|
||||
- **Frequency:** Medium-High (pool alloc/free operations)
|
||||
- **Expected:** +0.2~0.5% (high-frequency なら)
|
||||
- **Note:** 完全に stats 専用なら高優先度
|
||||
|
||||
**File:** `core/front/tiny_unified_cache.c` (multiple locations)
|
||||
**Frequency:** Warm (cache refill path, 中頻度)
|
||||
**Expected Gain:** +0.2~0.4%
|
||||
|
||||
### 方針(箱の境界)
|
||||
|
||||
- BuildFlagsBox: `core/hakmem_build_flags.h`
|
||||
- `HAKMEM_UNIFIED_CACHE_STATS_COMPILED=0/1`(default: 0)を追加
|
||||
- 0 のとき:
|
||||
- 全ての unified cache stats atomics を compile-out
|
||||
- API/構造は維持(既存の箱を汚さない)
|
||||
|
||||
### A/B(build-level)
|
||||
|
||||
1) **baseline(default compile-out)**
|
||||
```bash
|
||||
make clean && make -j bench_random_mixed_hakmem
|
||||
scripts/run_mixed_10_cleanenv.sh > phase27_baseline.txt
|
||||
```
|
||||
|
||||
2) **compiled-in(研究用)**
|
||||
```bash
|
||||
make clean && make -j EXTRA_CFLAGS='-DHAKMEM_UNIFIED_CACHE_STATS_COMPILED=1' bench_random_mixed_hakmem
|
||||
scripts/run_mixed_10_cleanenv.sh > phase27_compiled_in.txt
|
||||
```
|
||||
|
||||
### 判定(保守運用)
|
||||
|
||||
- **GO:** +0.5% 以上 → 本線採用(compiled-out を default に)
|
||||
- **NEUTRAL:** ±0.5% → code cleanliness で採用(compiled-out を default に)
|
||||
- **NO-GO:** -0.5% 以下 → revert(compiled-in を default に戻す)
|
||||
|
||||
### 実装パターン(Phase 24+25+26 と同様)
|
||||
|
||||
```c
|
||||
// core/hakmem_build_flags.h
|
||||
#ifndef HAKMEM_UNIFIED_CACHE_STATS_COMPILED
|
||||
# define HAKMEM_UNIFIED_CACHE_STATS_COMPILED 0
|
||||
#endif
|
||||
|
||||
// core/front/tiny_unified_cache.c (各箇所)
|
||||
#if HAKMEM_UNIFIED_CACHE_STATS_COMPILED
|
||||
atomic_fetch_add_explicit(&g_unified_cache_hits_global, 1, memory_order_relaxed);
|
||||
atomic_fetch_add_explicit(&g_unified_cache_hits_by_class[class_idx], 1, memory_order_relaxed);
|
||||
#else
|
||||
(void)0; // No-op when compiled out
|
||||
#endif
|
||||
```
|
||||
|
||||
### ドキュメント要件
|
||||
|
||||
実装後、以下を作成:
|
||||
- `docs/analysis/PHASE27_UNIFIED_CACHE_STATS_RESULTS.md`
|
||||
- Implementation details
|
||||
- A/B test results (10-run baseline vs compiled-in)
|
||||
- Verdict & reasoning
|
||||
- Files modified
|
||||
- `docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md` を更新
|
||||
- Phase 27 追加
|
||||
- 累積効果更新
|
||||
|
||||
## 今後の Phase 候補(優先順位順)
|
||||
|
||||
### Phase 27: Unified Cache Stats (WARM, HIGH PRIORITY)
|
||||
- **Expected:** +0.2~0.4%
|
||||
- **File:** `core/front/tiny_unified_cache.c`
|
||||
- **Atomics:** `g_unified_cache_*` (複数)
|
||||
|
||||
### Phase 28: Background Spill Queue (WARM, MEDIUM - 要分類)
|
||||
- **Expected:** +0.1~0.2% (telemetry の場合)
|
||||
- **File:** `core/hakmem_tiny_bg_spill.h`
|
||||
- **Atomics:** `g_bg_spill_len`
|
||||
- **Note:** Correctness 確認が必要(queue length が flow control に使われている可能性)
|
||||
|
||||
### Phase 29+: Cold Path Stats (COLD, LOW PRIORITY)
|
||||
### 候補 C: Cold Path Stats (COLD, LOW PRIORITY)
|
||||
- **Expected:** <0.1% (code cleanliness のみ)
|
||||
- **Targets:**
|
||||
- SS allocation stats (`g_ss_os_alloc_calls`, etc.)
|
||||
- Shared pool diagnostics (`rel_c7_*`, `dbg_c7_*`)
|
||||
- Debug trace logs (`g_hak_alloc_at_trace`, etc.)
|
||||
|
||||
### 推奨: 候補 B (Pool Hotbox v2 Stats)
|
||||
|
||||
**理由:**
|
||||
- Stats 専用の可能性が高い(flow control の懸念が低い)
|
||||
- Pool operations は頻度が高い(+0.2~0.5% 期待)
|
||||
- Phase 24-27 の成功パターンと同類(high-frequency telemetry)
|
||||
|
||||
**次のアクション:**
|
||||
1. `g_pool_hotbox_v2_stats` 全使用箇所を grep
|
||||
2. CORRECTNESS vs TELEMETRY 分類
|
||||
3. TELEMETRY なら Phase 24-27 パターンで実装 & A/B test
|
||||
|
||||
## 参考
|
||||
|
||||
- **mimalloc Gap Analysis:** `docs/roadmap/OPTIMIZATION_ROADMAP.md`
|
||||
@ -187,16 +148,23 @@ scripts/run_mixed_10_cleanenv.sh > phase27_compiled_in.txt
|
||||
|
||||
## タスク完了条件
|
||||
|
||||
Phase 27 完了時:
|
||||
1. ✅ `HAKMEM_UNIFIED_CACHE_STATS_COMPILED` flag 追加
|
||||
2. ✅ 全 unified cache stats atomics をラップ
|
||||
3. ✅ A/B test 実施(10-run baseline vs compiled-in)
|
||||
4. ✅ Verdict 判定(GO / NEUTRAL / NO-GO)
|
||||
5. ✅ `PHASE27_*_RESULTS.md` 作成
|
||||
6. ✅ Cumulative summary 更新
|
||||
### Phase 28 完了済み条件(2025-12-16):
|
||||
1. ✅ Background spill queue 全 atomic の監査完了(8 atomics)
|
||||
2. ✅ CORRECTNESS vs TELEMETRY 分類完了(8/8 CORRECTNESS)
|
||||
3. ✅ `g_bg_spill_len` の flow control 使用確認
|
||||
4. ✅ NO-OP 判定(compile-out 候補なし)
|
||||
5. ✅ `PHASE28_BG_SPILL_ATOMIC_AUDIT.md` 作成
|
||||
6. ✅ `PHASE28_BG_SPILL_ATOMIC_PRUNE_RESULTS.md` 作成
|
||||
7. ✅ Cumulative summary 更新(Phase 24-28)
|
||||
|
||||
### Phase 29 開始前の前提条件:
|
||||
1. ⏳ 候補選定(Pool Hotbox v2 Stats 推奨)
|
||||
2. ⏳ 全 atomic の CORRECTNESS vs TELEMETRY 分類
|
||||
3. ⏳ TELEMETRY の場合: Phase 24-27 パターンで実装 & A/B テスト
|
||||
4. ⏳ CORRECTNESS の場合: Phase 29 skip、Phase 30+ 候補選定
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** 2025-12-16
|
||||
**Current Phase:** Phase 26 Complete (+2.00% cumulative)
|
||||
**Next Phase:** Phase 27 (Unified Cache Stats, warm path)
|
||||
**Current Phase:** Phase 28 Complete (+2.74% cumulative, 17 atomics removed)
|
||||
**Next Phase:** Phase 29 (候補: Pool Hotbox v2 Stats or Remote Target Queue)
|
||||
|
||||
Reference in New Issue
Block a user