Phase 35-39: FAST build optimization complete (+7.13% cumulative)

Phase 35-A: BENCH_MINIMAL gate function elimination (GO +4.39%)
- tiny_front_v3_enabled() → constant true
- tiny_metadata_cache_enabled() → constant 0
- learner_v7_enabled() → constant false
- small_learner_v2_enabled() → constant false

Phase 36: Policy snapshot init-once (GO +0.71%)
- small_policy_v7_snapshot() version check skip in BENCH_MINIMAL
- TLS cache for policy snapshot

Phase 37: Standard TLS cache (NO-GO -0.07%)
- TLS cache for Standard build attempted
- Runtime gate overhead negates benefit

Phase 38: FAST/OBSERVE/Standard workflow established
- make perf_fast, make perf_observe targets
- Scorecard and documentation updates

Phase 39: Hot path gate constantization (GO +1.98%)
- front_gate_unified_enabled() → constant 1
- alloc_dualhot_enabled() → constant 0
- g_bench_fast_front, g_v3_enabled blocks → compile-out
- free_dispatch_stats_enabled() → constant false

Results:
- FAST v3: 56.04M ops/s (47.4% of mimalloc)
- Standard: 53.50M ops/s (45.3% of mimalloc)
- M1 target (50%): 5.5% remaining

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-12-16 15:01:56 +09:00
parent 506e724c3b
commit b7085c47e1
22 changed files with 1550 additions and 397 deletions

View File

@ -1,362 +1,44 @@
# 本線タスク(現在
# CURRENT_TASKRolling
## 現在の状態(要約
## 0) 今の「正」Phase 39
- **安定版(本線)**: Phase 31 完了g_tiny_free_trace compile-out — NEUTRAL verdict、code cleanliness で採用
- **直近の判断**:
- Phase 24OBSERVE 税 prune / tiny_class_stats: ✅ GO (+0.93%)
- Phase 25Free Stats atomic prune / g_free_ss_enter: ✅ GO (+1.07%)
- Phase 26Hot path diagnostic atomics prune / 5 atomics: ⚪ NEUTRAL (-0.33%, code cleanliness で採用)
- Phase 27Unified Cache Stats atomic prune / 6 atomics: ✅ GO (+0.74% mean, +1.01% median)
- Phase 28Background Spill Queue audit / 8 atomics: ⚪ NO-OP (全て CORRECTNESS)
- Phase 29Pool Hotbox v2 Stats audit / 12 atomics: ⚪ NO-OP (ENV-gated, 実行されない)
- Phase 30Standard Procedure Documentation: ✅ PROCEDURE COMPLETE (412 atomics 監査完了)
- Phase 31Tiny Free Trace atomic prune / g_tiny_free_trace: ⚪ NEUTRAL (-0.35%, code cleanliness で採用)
- **計測の正**: `scripts/run_mixed_10_cleanenv.sh`(同一バイナリ / clean env / 10-run
- **累積効果**: **+2.74%** (Phase 24: +0.93% + Phase 25: +1.07% + Phase 26: NEUTRAL + Phase 27: +0.74% + Phase 28: NO-OP + Phase 29: NO-OP + Phase 30: PROCEDURE + Phase 31: NEUTRAL)
- **目標/現状スコアカード**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md`
- **性能比較の正**: **FAST build**`make perf_fast`
- **安全・互換の正**: Standard build`make bench_random_mixed_hakmem`
- **観測の正**: OBSERVE build`make perf_observe`
- **スコアカード**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md`
- **計測の正Mixed 10-run**: `scripts/run_mixed_10_cleanenv.sh``ITERS=20000000 WS=400`
## 原則Box Theory 運用ルール
## 1) 現状(最新スナップショット
- FAST v3: **56.04M ops/s**mimalloc の **47.4%**
- Standard: **53.50M ops/s**mimalloc の **45.3%**
※詳細は `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md` を正とする(ここは要点だけ)。
## 2) 原則Box Theory 運用)
- 変更は箱で分けるENV / build flag で戻せる)
- 変換点(境界は 1 箇所に集約する
- "削除して速くする" は危険layout/LTO で反転する)
- ✅ compile-out`#if HAKMEM_*_COMPILED`)は許容
-link-outMakefile から `.o` を外す)は封印Phase 22-2 NO-GO
- **Atomic 監査原則**Phase 30 標準化):
- **Step 0: 実行確認MANDATORY**: ENV gate / 実行カウンタ確認Phase 29 教訓)
- **Step 1: CORRECTNESS vs TELEMETRY 分類**: `if` 条件 = CORRECTNESSPhase 28 教訓)
- **Step 2: Compile-out 実装**: `#if HAKMEM_*_COMPILED` で wrap
- **Step 3: A/B test**: Baseline vs Compiled-in10-run 比較)
- **Verdict**: GO (+0.5%+), NEUTRAL (±0.5%), NO-GO (-0.5%+)
- 境界は 1 箇所(変換点を増やさない)
- **削除して速くするlink-out / 大きい削除)は封印**layout/LTO で符号反転する)
- ✅ compile-out`#if HAKMEM_*_COMPILED` / `#if HAKMEM_BENCH_MINIMAL`)は許容
- ❌ Makefile から `.o` を外す / コード物理削除は原則しないPhase 22-2 NO-GO
- A/B は **同一バイナリ**でトグルENV / build flag。別バイナリ比較は layout が混ざる。
## Phase 30 完了2025-12-16
## 3) 次の指示書
### 実施内容
TBDPhase 39 完了)
**目的:** Phase 24-29 の学びを 4-step 標準手順として固定化し、Phase 31 候補を選定する。
## 4) 直近のログ(要点だけ)
**成果物:**
1. `docs/analysis/PHASE30_STANDARD_PROCEDURE.md` - 4-step 標準手順書
2. `docs/analysis/ATOMIC_AUDIT_FULL.txt` - 全 atomic 監査結果412 atomics
3. `docs/analysis/PHASE31_CANDIDATES_HOT.txt` - HOT path 候補抽出
4. `docs/analysis/PHASE31_CANDIDATES_WARM.txt` - WARM path 候補抽出
5. `docs/analysis/PHASE31_RECOMMENDED_CANDIDATES.md` - Phase 31 推奨候補TOP 3
6. `docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md` 更新Phase 30 追記)
- Phase 2434: atomic prune 累積 **+2.74%**(その後 diminishing returns
- Phase 35-A: `HAKMEM_BENCH_MINIMAL=1`gate prune**GO +4.39%**
- Phase 36: FAST-only policy snapshot 最適化 **GO +0.71%**
- Phase 37: Standard TLS cache **NO-GO**runtime gate の税が勝つ)
- Phase 38: FAST/OBSERVE/Standard 運用確立scorecard + Makefile targets
- Phase 39: FAST v3 gate 定数化 **GO +1.98%**
- 結果詳細: `docs/analysis/PHASE39_FAST_V3_GATE_CONSTANTIZATION_RESULTS.md`
### 監査結果
## 5) アーカイブ
**全 atomic 監査:**
- **Total atomics:** 412
- **TELEMETRY:** 104 (25%)
- **CORRECTNESS:** 24 (6%)
- **UNKNOWN:** 284 (69%, manual review needed)
-`CURRENT_TASK.md`(詳細ログ)は `archive/CURRENT_TASK_ARCHIVE_20251216.md`
**Path 分類:**
- **HOT path:** 16 atomics (5 TELEMETRY, 11 UNKNOWN)
- **WARM path:** 10 atomics (3 TELEMETRY, 7 UNKNOWN)
- **COLD path:** 386 atomics (remaining)
**NEW 候補(未コンパイルアウト):**
- **HOT path:** 1 candidate (`g_tiny_free_trace`)
- **WARM path:** 3 candidates (`rel_logs`, `dbg_logs`, `g_p0_class_oob_log`)
### Step 0 実行確認結果
**HOT path:**
1. `g_tiny_free_trace` (HOT, TELEMETRY)
- ✅ ENV gate なし
-`hak_tiny_free()` で実行(毎回)
- ✅ Execution verified
- **Verdict:** **TOP PRIORITY for Phase 31**
**WARM path:**
1. `rel_logs` + `dbg_logs` (WARM, TELEMETRY)
- ❌ ENV gated by `HAKMEM_TINY_WARM_LOG` (OFF by default)
- ❌ 実行されないPhase 29 pattern
- **Verdict:** SKIP
2. `g_p0_class_oob_log` (WARM, TELEMETRY)
- ✅ ENV gate なし
- ⚠️ Error pathout-of-bounds class index
- ❓ 実行頻度不明(要検証)
- **Verdict:** LOW PRIORITYPhase 32 候補)
### 4-Step Standard Procedure
**Phase 30 で確立された型:**
**Step 0: 実行確認NEW - Phase 29 教訓)**
- ENV gate チェック(`rg "getenv.*FEATURE" core/`
- 実行カウンタ確認Mixed 10-run で > 0
- perf/flamegraph 検証(オプション)
- **Decision:** ❌ 実行されない → SKIP
**Step 1: CORRECTNESS/TELEMETRY 分類Phase 28 教訓)**
- 全使用箇所を追跡(`rg -n "g_variable" core/`
- `if` 条件で使用 → CORRECTNESSDO NOT TOUCH
- `fprintf/fprintf` のみ → TELEMETRYcompile-out 候補)
- **Decision:** CORRECTNESS → DO NOT TOUCH
**Step 2: Compile-Out 実装Phase 24-27 pattern**
- `hakmem_build_flags.h` に gate 追加
- TELEMETRY atomic を `#if` で wrap
- Build-level compile-outlink-out 禁止)
**Step 3: A/B Testbuild-level comparison**
- Baseline (COMPILED=0): default build
- Compiled-in (COMPILED=1): research build
- **Verdict:** GO (+0.5%+), NEUTRAL (±0.5%), NO-GO (-0.5%+)
### 判定
**PROCEDURE COMPLETE**
**理由:**
- 4-step procedure 確立Phase 24-29 学習を体系化)
- Step 0 (実行確認) が Phase 29 空振りを防ぐ
- 全 atomic 監査完了412 atomics
- Phase 31 候補選定完了TOP 1: `g_tiny_free_trace`
### ドキュメント
- `docs/analysis/PHASE30_STANDARD_PROCEDURE.md` (標準手順書)
- `docs/analysis/ATOMIC_AUDIT_FULL.txt` (全 atomic 監査結果)
- `docs/analysis/PHASE31_RECOMMENDED_CANDIDATES.md` (Phase 31 候補 TOP 3)
- `docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md` (Phase 24-30 総括)
### 教訓
**空振り防止 3 原則:**
1. **Step 0 は必須ゲート**: ENV-gated コードは最初に弾くPhase 29 教訓)
2. **カウンタ名 ≠ 用途**: Flow control か telemetry か全使用箇所で確認Phase 28 教訓)
3. **HOT path 優先**: 実行頻度が性能影響を決めるPhase 24-27 教訓)
## 累積効果Phase 24+25+26+27+28+29+30+31
| Phase | Target | Impact | Status |
|-------|--------|--------|--------|
| **24** | `g_tiny_class_stats_*` (5 atomics) | **+0.93%** | GO ✅ |
| **25** | `g_free_ss_enter` (1 atomic) | **+1.07%** | GO ✅ |
| **26** | Hot path diagnostics (5 atomics) | **-0.33%** | NEUTRAL ✅ |
| **27** | `g_unified_cache_*` (6 atomics) | **+0.74%** | GO ✅ |
| **28** | Background Spill Queue (8 atomics) | **N/A** | NO-OP ✅ |
| **29** | Pool Hotbox v2 Stats (12 atomics) | **0.00%** | NO-OP ✅ |
| **30** | Standard Procedure (412 atomic audit) | **N/A** | PROCEDURE ✅ |
| **31** | `g_tiny_free_trace` (1 atomic) | **-0.35%** | NEUTRAL ✅ |
| **合計** | **18 atomics removed, 412 audited** | **+2.74%** | **✅** |
**Key Insight:** 標準手順が次の Phase の成功確率を上げる。
- Step 0 (実行確認) で ENV-gated code を弾く → Phase 29 空振りを防止
- Step 1 (分類) で CORRECTNESS を弾く → Phase 28 誤判定を防止
- HOT path 優先 → Phase 24-27 成功パターン(+0.5~1.0%
- **NEW:** NEUTRAL verdict でも code cleanliness で採用可 → Phase 26/31 パターン
## Phase 31: g_tiny_free_trace compile-out 完了2025-12-16
### 実施内容
**目的:** `hak_tiny_free()` 先頭の trace-rate-limit atomic を compile-outdefaultして固定税を削る。
**成果物:**
1. `docs/analysis/PHASE31_TINY_FREE_TRACE_ATOMIC_PRUNE_RESULTS.md` - A/B test results + NEUTRAL verdict
2. `docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md` 更新Phase 31 追記)
3. `CURRENT_TASK.md` 更新Phase 31 完了 + Phase 32 候補提示)
### A/B Test 結果
**Baseline (COMPILED=0, trace compiled-out):**
- Mean: 53.64 M ops/s
- Median: 53.80 M ops/s
**Compiled-in (COMPILED=1, trace active):**
- Mean: 53.83 M ops/s
- Median: 53.70 M ops/s
**Difference:**
- Mean: -0.35% (Baseline SLOWER)
- Median: +0.19% (Baseline FASTER)
- **Verdict:** **NEUTRAL** (±0.5% 範囲内)
### 判定
**NEUTRAL → Code Cleanliness で採用**
**理由:**
1. **Performance:** Mean -0.35%, Median +0.19% → 測定イズ範囲conflicting signals
2. **Phase 26 precedent:** -0.33% NEUTRAL → code cleanliness で採用
3. **Phase 31 同型:** -0.35% NEUTRAL → 同じ判断基準を適用
4. **Code cleanliness benefits:**
- HOT path (`hak_tiny_free()` entry) から unused TELEMETRY atomic 削除
- 複雑さ削減trace macro のみ、flow control なし)
- Research flexibility 維持(`COMPILED=1` で復活可)
**Key Finding:** Not all HOT path atomics have measurable overhead
- Phase 25 (`g_free_ss_enter`): +1.07% GO (always-increment stats)
- Phase 31 (`g_tiny_free_trace`): NEUTRAL (rate-limited to 128 calls)
- **Hypothesis:** Rate-limiting or compiler optimization may eliminate overhead
### ドキュメント
- `docs/analysis/PHASE31_TINY_FREE_TRACE_ATOMIC_PRUNE_RESULTS.md` (完全な A/B test 結果)
- `docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md` (Phase 24-31 総括)
- `CURRENT_TASK.md` (Phase 31 完了 + Phase 32 候補)
### 教訓
**Phase 31 から学んだこと:**
1. **HOT path ≠ guaranteed win:** Even high-frequency atomics may have zero overhead if optimized
2. **NEUTRAL is valid:** Code cleanliness justifies compile-out even without performance gain (Phase 26/31 precedent)
3. **Step 0 (execution verification) works:** Prevented Phase 29-style no-op (confirmed always active)
4. **Standard procedure validated:** Phase 30 4-step procedure successfully guided Phase 31
## 次の指示Phase 32 実施)
**Phase 31 完了:** NEUTRAL verdict、code cleanliness で採用 → Phase 32 実施へ
### Phase 32 推奨候補: `g_hak_tiny_free_calls` (HOT path, TOP PRIORITY) ⭐
**Location:** `core/hakmem_tiny_free.inc:335` (9 lines after Phase 31 target)
**Code Context:**
```c
void hak_tiny_free(void* ptr) {
#if HAKMEM_TINY_FREE_TRACE_COMPILED
// Phase 31 target (now compiled-out)
#endif
// Track total tiny free calls (diagnostics)
extern _Atomic uint64_t g_hak_tiny_free_calls;
atomic_fetch_add_explicit(&g_hak_tiny_free_calls, 1, memory_order_relaxed); // ← Phase 32 target
// ... rest of function ...
}
```
**Classification:**
- **Class:** TELEMETRY (trace macro only)
- **Path:** HOT (every tiny free call)
- **Usage:** Only for `HAK_TRACE` debug macro output
- **ENV Gate:** None (always active)
**Step 0 Verification (inherited from Phase 31):**
- ✅ No ENV gate blocking execution (same function as Phase 31)
- ✅ In `hak_tiny_free()` - called on every tiny free operation
- ✅ Mixed benchmark heavily exercises tiny free path
- ✅ Confirmed: Executes thousands of times per benchmark run (same as Phase 31)
**Expected Impact:** **+0.3% to +0.7%** (smaller than Phase 25: +1.07%, similar to Phase 31: NEUTRAL)
**Implementation Plan:**
**Step 1: 分類(要実施)**
- ❓ Classification needed: TELEMETRY or CORRECTNESS?
- ❓ Check all usage sites with `rg -n "g_hak_tiny_free_calls" core/`
- ❓ Verify no `if` conditions using counter value
- ✅ Expected: Pure TELEMETRY (diagnostic counter)
**Step 2: Compile-Out 実装**
a) Add BuildFlags gate:
```c
// core/hakmem_build_flags.h
// ========== Tiny Free Calls Counter Prune (Phase 32) ==========
#ifndef HAKMEM_TINY_FREE_CALLS_COMPILED
# define HAKMEM_TINY_FREE_CALLS_COMPILED 0
#endif
```
b) Wrap atomic in `core/hakmem_tiny_free.inc`:
```c
void hak_tiny_free(void* ptr) {
#if HAKMEM_TINY_FREE_TRACE_COMPILED
// Phase 31 (already compiled-out)
#endif
#if HAKMEM_TINY_FREE_CALLS_COMPILED
extern _Atomic uint64_t g_hak_tiny_free_calls;
atomic_fetch_add_explicit(&g_hak_tiny_free_calls, 1, memory_order_relaxed);
#else
(void)0; // No-op when compiled out
#endif
// ... rest of function ...
}
```
**Step 3: A/B Test**
Baseline (COMPILED=0):
```bash
make clean && make -j bench_random_mixed_hakmem
scripts/run_mixed_10_cleanenv.sh
```
Compiled-in (COMPILED=1):
```bash
make clean && make -j EXTRA_CFLAGS='-DHAKMEM_TINY_FREE_CALLS_COMPILED=1' bench_random_mixed_hakmem
scripts/run_mixed_10_cleanenv.sh
```
**Expected Result:** +0.3% to +0.7% (possible GO, or NEUTRAL like Phase 31)
**Rationale:**
- Same HOT path as Phase 31 (9 lines below in same function)
- No ENV gate blocking execution (verified in Phase 31)
- Similar profile to Phase 31 (diagnostic counter)
- Moderate confidence: NEUTRAL possible (like Phase 31), but worth trying
### Alternative Candidates (if Phase 32 shows NEUTRAL again)
**#3: `g_p0_class_oob_log` (WARM path, error logging)**
- ❓ Execution uncertain (error path)
- Expected: ±0.0% to +0.2%
- Action: Verify execution first
**#4-#N: Manual review of UNKNOWN atomics (284 candidates)**
- Many may be misclassified by naming heuristics
- Requires deeper code inspection
- Lower priority
**Note:** If Phase 32 is NEUTRAL (like Phase 31), consider pausing HOT path atomic prune and moving to other optimization areas (e.g., inlining, branch optimization, SIMD opportunities).
## 参考
- **Standard Procedure:** `docs/analysis/PHASE30_STANDARD_PROCEDURE.md`
- **Phase 31 Results:** `docs/analysis/PHASE31_TINY_FREE_TRACE_ATOMIC_PRUNE_RESULTS.md`
- **Cumulative Summary:** `docs/analysis/ATOMIC_PRUNE_CUMULATIVE_SUMMARY.md`
- **mimalloc Gap Analysis:** `docs/roadmap/OPTIMIZATION_ROADMAP.md`
- **Box Theory:** Phase 6-1.7+ の Box Refactor パターン
- **Phase 24-27 Pattern:** `core/box/tiny_class_stats_box.h`, `core/hakmem_build_flags.h`
- **Phase 26/31 NEUTRAL Precedent:** Code cleanliness adoption without performance win
## タスク完了条件
### Phase 30 完了済み条件2025-12-16:
1.`PHASE30_STANDARD_PROCEDURE.md` 作成4-step procedure
2. ✅ 全 atomic 監査実行412 atomics, audit_atomics.sh
3. ✅ HOT/WARM path TELEMETRY 候補抽出
4. ✅ Step 0 実行確認(全候補)
5.`PHASE31_RECOMMENDED_CANDIDATES.md` 作成TOP 3 prioritized
6. ✅ Cumulative summary 更新Phase 24-30
7. ✅ CURRENT_TASK.md 更新Phase 31 候補提示)
### Phase 31 完了条件2025-12-16:
1. ✅ 候補選定完了(`g_tiny_free_trace`, HOT path
2. ✅ Step 0 実行確認完了ENV gate なし、実行確認済み)
3. ✅ Step 1 分類完了Pure TELEMETRY、CORRECTNESS なし)
4. ✅ Step 2 実装BuildFlags + `#if` wrap
5. ✅ Step 3 A/B testBaseline vs Compiled-in
6. ✅ 結果ドキュメント作成PHASE31_RESULTS.md
7. ✅ NEUTRAL verdict → code cleanliness で採用
### Phase 32 開始前の前提条件:
1. ✅ 候補選定完了(`g_hak_tiny_free_calls`, HOT path, same function as Phase 31
2. ✅ Step 0 実行確認完了Phase 31 と同じ関数、ENV gate なし)
3. ⏳ Step 1 分類TELEMETRY/CORRECTNESS 判定)
4. ⏳ Step 2 実装BuildFlags + `#if` wrap
5. ⏳ Step 3 A/B testBaseline vs Compiled-in
6. ⏳ 結果ドキュメント作成PHASE32_RESULTS.md
---
**Last Updated:** 2025-12-16
**Current Phase:** Phase 31 Complete (NEUTRAL -0.35%, adopted for code cleanliness)
**Next Phase:** Phase 32 (`g_hak_tiny_free_calls`, HOT path, expected +0.3% to +0.7% or NEUTRAL)
**Cumulative Progress:** +2.74% (18 atomics removed, 412 atomics audited)