hakmem/docs/archive/PHASE_6.15_P0.3_COMPLETION.md

# Phase 6.15 P0.3 完了報告: EVOLUTION復元 + 環境変数制御

**Date**: 2025-10-22
**Status**: ✅ 完了
**Goal**: EVOLUTION復元 + 環境変数制御実装（デフォルト無効）

---

## 📊 **Executive Summary**

### **実装完了内容**

✅ **EVOLUTION block 復元**: `#if 0` → `#if HAKMEM_FEATURE_EVOLUTION`
✅ **環境変数制御追加**: `HAKMEM_EVO_SAMPLE`（デフォルト=0で無効）
✅ **性能検証完了**: system malloc とほぼ同等（+1.0%）

---

## 🔧 **実装詳細**

### **1. グローバル変数追加** (hakmem.c:61-62)

```c
// Phase 6.15 P0.3: EVO Sampling Control (environment variable)
static uint64_t g_evo_sample_mask = 0;  // 0 = disabled (default), (1<<N)-1 = sample every 2^N calls
```

**目的**: サンプリング頻度を環境変数で制御

---

### **2. 環境変数読み込み** (hakmem.c:283-300)

```c
// Phase 6.15 P0.3: Configure EVO sampling from environment variable
// HAKMEM_EVO_SAMPLE: 0=disabled (default), N=sample every 2^N calls
// Example: HAKMEM_EVO_SAMPLE=10 → sample every 1024 calls
//          HAKMEM_EVO_SAMPLE=16 → sample every 65536 calls
char* evo_sample_str = getenv("HAKMEM_EVO_SAMPLE");
if (evo_sample_str && atoi(evo_sample_str) > 0) {
    int freq = atoi(evo_sample_str);
    if (freq >= 64) {
        fprintf(stderr, "[hakmem] Warning: HAKMEM_EVO_SAMPLE=%d too large, using 63\n", freq);
        freq = 63;
    }
    g_evo_sample_mask = (1ULL << freq) - 1;
    HAKMEM_LOG("EVO sampling enabled: every 2^%d = %llu calls\n",
               freq, (unsigned long long)(g_evo_sample_mask + 1));
} else {
    g_evo_sample_mask = 0;  // Disabled by default
    HAKMEM_LOG("EVO sampling disabled (HAKMEM_EVO_SAMPLE not set or 0)\n");
}
```

**機能**:
- 環境変数未設定 → `g_evo_sample_mask = 0`（無効、最速）
- `HAKMEM_EVO_SAMPLE=10` → 1024回に1回サンプリング
- `HAKMEM_EVO_SAMPLE=16` → 65536回に1回サンプリング（極軽量）
- 上限64（オーバーフロー防止）

---

### **3. EVOLUTION Block 復元** (hakmem.c:400-417)

**Before (P0.2)**:
```c
#if 0 // HAKMEM_FEATURE_EVOLUTION  ← 無効化（性能悪化の原因）
    static _Atomic uint64_t tick_counter = 0;
    if ((atomic_fetch_add(&tick_counter, 1) & 0x3FF) == 0) {
        clock_gettime(...);
    }
#endif
```

**After (P0.3)**:
```c
#if HAKMEM_FEATURE_EVOLUTION  ← 復元
    // Only sample if enabled via HAKMEM_EVO_SAMPLE environment variable
    if (g_evo_sample_mask > 0) {  ← 環境変数で制御
        static _Atomic uint64_t tick_counter = 0;
        if ((atomic_fetch_add(&tick_counter, 1) & g_evo_sample_mask) == 0) {
            struct timespec now;
            clock_gettime(CLOCK_MONOTONIC, &now);
            uint64_t now_ns = now.tv_sec * 1000000000ULL + now.tv_nsec;

            if (hak_evo_tick(now_ns)) {
                int new_strategy = hak_elo_select_strategy();
                atomic_store(&g_cached_strategy_id, new_strategy);
            }
        }
    }
#endif
```

**重要な変更**:
- `if (g_evo_sample_mask > 0)` チェック追加 → デフォルトで完全に無効化
- `& g_evo_sample_mask` → 固定値 `0x3FF` から動的マスクへ変更

---

## 📊 **ベンチマーク結果**

### **Test 1: デフォルト（EVOLUTION無効）**

```bash
./bench_allocators --allocator hakmem-baseline --scenario json --iterations 10000
```

**結果**:
```
hakmem-baseline,json,10000,210,17,0,0,4758892
```

- **平均時間**: 210ns/op
- **Throughput**: 4.76M ops/sec

---

### **Test 2: system malloc（比較）**

```bash
./bench_allocators --allocator system --scenario json --iterations 10000
```

**結果**:
```
system,json,10000,208,17,0,0,4802216
```

- **平均時間**: 208ns/op
- **Throughput**: 4.80M ops/sec

---

### **Test 3: EVOLUTION有効（1024サンプリング）**

```bash
HAKMEM_EVO_SAMPLE=10 ./bench_allocators --allocator hakmem-baseline --scenario json --iterations 10000
```

**結果**:
```
hakmem-baseline,json,10000,215,17,0,0,4639566
```

- **平均時間**: 215ns/op
- **Throughput**: 4.64M ops/sec

---

### **比較表**

| モード | 平均時間 | ops/sec | vs system | 備考 |
|--------|----------|---------|-----------|------|
| **hakmem (default, EVO無効)** | 210ns | 4.76M | **+1.0%** | ✅ 最速 |
| **system malloc** | 208ns | 4.80M | baseline | 比較基準 |
| **hakmem (EVO有効, 1024)** | 215ns | 4.64M | **+3.4%** | 学習ON時 |

---

## ✅ **検証項目**

1. ✅ **EVOLUTION復元成功**: `#if 0` → `#if HAKMEM_FEATURE_EVOLUTION`
2. ✅ **環境変数制御動作**: `HAKMEM_EVO_SAMPLE` 正常動作
3. ✅ **デフォルト無効**: 環境変数未設定で `g_evo_sample_mask = 0`
4. ✅ **性能維持**: system malloc と +1.0% （ほぼ同等）
5. ✅ **サンプリングON時オーバーヘッド**: +2.4% (210ns → 215ns) 許容範囲

---

## 🎯 **次のステップ**

### **Option A: larson ベンチマークで再テスト**（推奨しない）

**理由**:
- larson は重すぎて調査に不向き（26分実行）
- 小サイズ混合allocation で mutex オーバーヘッドが支配的
- TLS実装（Step 3）まで待つべき

---

### **Option B: Step 3へ進む**（推奨）

**次フェーズ: Phase 6.15 Step 3 (TLS Multi-threaded)**

**目標**:
- Thread-Local Storage (TLS) 実装
- Tiny Pool / L2 Pool / L2.5 Pool を TLS化
- 95%+ のロック回避 → 13-15M ops/sec on larson (4T)

**見積もり**: 8-10時間

**根拠**:
- Phase 6.13 で TLS validation 済み (+123-146% 実証済み)
- 軽量ベンチマークは既に system と同等
- TLS なしでは larson の性能改善は不可能

---

## 📝 **学んだこと**

### **1. EVOLUTION無効化は逆効果だった**

**P0.2の結果**:
- EVOLUTION無効化 → 性能41%悪化（1.05M → 0.62M）
- Pool ERROR 発生
- 初期化依存関係の複雑さ

**教訓**: 部分的無効化より、環境変数で完全制御する方が安全

---

### **2. デフォルト無効 + opt-in が最適**

**設計**:
- デフォルト: `g_evo_sample_mask = 0`（無効、最速）
- 必要時のみ: `HAKMEM_EVO_SAMPLE=10`（学習ON）

**利点**:
- ✅ 初期性能: system と同等
- ✅ 学習機能: オプトイン可能
- ✅ デバッグ: 環境変数で簡単に切り替え

---

### **3. 軽量ベンチマークの重要性**

**larson vs bench_allocators**:

| 項目 | larson | bench_allocators |
|------|--------|------------------|
| 実行時間 | 26分 | <1秒 |
| サイズ範囲 | 8-1024B混合 | 64KB固定 |
| デバッグ向き | ❌ | ✅ |

**教訓**: 基本性能確認は軽量ベンチマークで、本番想定は重量ベンチマークで

---

## 🚀 **推奨アクション**

1. ✅ **P0.3 完了確認** - このドキュメント作成
2. ⏭️ **CURRENT_TASK.md 更新** - 次のステップ記録
3. ⏭️ **Step 3 計画立案** - TLS実装詳細設計
4. ⏭️ **ultrathink 起動** - TLS実装計画作成

---

**Status**: ✅ P0.3 完了、次は Step 3（TLS実装）へ