Files

Moe Charm (CI) 19056282b6 Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO]

Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path
- Strategy: Cache wrapper env configuration pointer in TLS
- Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*)

Implementation:
- core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE)
- core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast)
- core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths
- ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF)

A/B Test Results (Mixed, 10-run, 20M iters):
- Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median)
- Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median)
- Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO)

Analysis:
- Regression cause: TLS cache adds overhead (branch + TLS access)
- wrapper_env_cfg() is already minimal (pointer return after simple check)
- Adding TLS caching layer makes it worse, not better
- Branch prediction penalty outweighs any potential savings

Cumulative Phase 2-3:
- B3: +2.89%, B4: +1.47%, C3: +2.20%
- D1: +1.06% (opt-in), D2: -1.44% (NO-GO)
- Total: ~7.2% (excluding D2)

Decision: FREEZE as research box (default OFF, regression confirmed)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

2025-12-13 22:03:27 +09:00

7.7 KiB

Raw Blame History

Phase 3 C2: Slab Metadata Cache Optimization 設計メモ

目的

Free path が全体の 29-31% を占める中、metadata access（policy snapshot, slab descriptor）の cache locality を改善。

期待: +5-10% 改善

結果（A/B）

判定: 🔬 NEUTRAL（研究箱維持、default OFF）

Mixed 10-run:
- Baseline（HAKMEM_TINY_METADATA_CACHE=0）: avg 40.43M / median 40.72M
- Optimized（HAKMEM_TINY_METADATA_CACHE=1）: avg 40.25M / median 40.29M
- Delta: avg -0.45% / median -1.06%

理由（要約）:

policy hot cache: learner interlock の probe コストが gain を相殺
first page cache: “lookup が支配的” な状況ではなかった（free 側は既に軽い）
bounds check: 既にコンパイラ最適化が効いていた

観察と根拠

現状のメモリアクセスパターン（Free path）

free(ptr)
  → tiny_free_fast()
    ① TLS g_unified_cache[class] 取得              [L1 miss 確率: ~5%]
    ② policy_snapshot() / static_route              [L1 miss 確率: ~2%]
    ③ slab descriptor lookup（superslab, segment）  [L1 miss 確率: ~8%]
    ④ slots[] array write                           [L1 miss 確率: ~10%]

ボトルネック: ③ slab descriptor lookup

Tiny クラス（C0-C3）: tiny_legacy_fallback_free_base()
それ以上: small_heap_free() → policy → route → handler

改善対象

Policy struct: 現在は global + TLS キャッシュ
- 理想: Hot member（route_kind[8], learner_v7_enabled）だけを TLS に複製
Slab descriptor: SmallSegment or SmallPageMeta lookup
- 現在: 64-byte alignment の metadata struct （複数 cache line）
- 改善案: First page descriptor を inline TLS context に（C0-C7 の "second tier" キャッシュ）
Unified cache slot[]: 既に TLS だが、array bounds check のため head/tail 更新が頻繁
- 改善案: capacity をマクロ定数化して bounds check を compile-out

Box Theory（実装計画）

L0: Env（戻せる）

HAKMEM_TINY_METADATA_CACHE=0/1  # default: 0（OFF）

目的: metadata hot 配置の on/off（回帰時に即戻せる）

L1: MetadataCacheBox（境界: 2 箇所）

1.1 Policy Hot Cache

責務: Free path での policy_snapshot() をスキップ

// tiny_env_box.h に追加
typedef struct {
    uint8_t route_kind[8];      // C0-C7 route (copied from policy, not learner-synced)
    uint8_t learner_v7_enabled; // Boolean: is learner v7 active?
} TinyPolicyHot;  // 9 bytes packed, fits in 16-byte slot

extern __thread TinyPolicyHot g_policy_hot;

初期化:

malloc/free entry で tiny_policy_hot_refresh()（policy_snapshot() の代わり）
learner_v7 enabled 時は disable（learner は動的に route_kind を更新）

1.2 Slab First Page Inline

責務: Slab descriptor lookup の "first hit" 率向上

// tiny_front_hot_box.h に追加（TLS context）
typedef struct {
    // Tiny LEGACY cache
    void* first_page_base;      // Current page pointer (avoid superslab lookup)
    uint16_t first_page_free_count;  // Free slots in current page
} TinyFirstPageCache;

extern __thread TinyFirstPageCache g_first_page_cache[8];  // Per-class

ライフサイクル:

Refill 時: first_page_cache[class] = new page
Retire 時: first_page_cache[class] = NULL（次の refill で update）

L2: Integration Points（2 箇所）

2.1 Free path（目標）

// tiny_legacy_fallback_box.h
inline void tiny_legacy_fallback_free_base(...) {
    if (tiny_metadata_cache_enabled()) {
        // Fast: Check cached first page
        if (ptr >= g_first_page_cache[class].base &&
            ptr < g_first_page_cache[class].base + PAGE_SIZE) {
            // Hit: update counter, push to unified cache
            ...
            return;
        }
    }
    // Slow: standard superslab lookup
    ...
}

2.2 Policy hot cache refresh

// malloc/free wrapper
if (__builtin_expect(small_policy_v7_version_changed(), 0)) {
    if (tiny_metadata_cache_enabled()) {
        tiny_policy_hot_refresh();  // Sync route_kind[8] from policy
    }
}

実装指示（段階的パッチ）

Patch 1: Policy Hot Cache（小）

ファイル:

core/box/tiny_metadata_cache_env_box.h (新規)
- ENV gate
- TinyPolicyHot struct
core/box/tiny_metadata_cache_hot_box.h (新規)
- API: tiny_policy_hot_refresh()
- TLS: extern __thread TinyPolicyHot g_policy_hot;
core/box/tiny_metadata_cache_hot_box.c (新規)
- Implementation

条件:

tiny_metadata_cache_enabled() && route_kind != learner_active
__builtin_expect(tiny_metadata_cache_enabled(), 0) (rare, ON時のみ)

測定対象: A/B test で +1-3% を期待

Patch 2: First Page Inline Cache（中）

ファイル:

core/front/tiny_first_page_cache.h (新規)
- TinyFirstPageCache struct
- tiny_first_page_cache_hit() inline check
- tiny_first_page_cache_update() on refill
core/front/tiny_legacy_fallback_box.h (変更)
- free path で first_page_cache check 追加

条件:

LEGACY route のみ（MID/ULTRA は不要）
Refill 時に自動 update

測定対象: Patch 1 + 2 で +3-7% を期待

Patch 3: Bounds Check Compile-out（小最適化）

ファイル:

core/front/tiny_unified_cache.h (変更)
- capacity をマクロ定数化（Hot_2048 strategy の hardcode）

条件:

compile-time 定数ならば & (2048-1) は一度だけ計算

測定対象: Patch 1+2+3 で +4-9% を期待

A/B（GO/NO-GO）

Test Plan

プロファイル: HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE

ベースライン: Patch 0（現在: +2.20% from C3）

HAKMEM_TINY_METADATA_CACHE=0  (default OFF)

最適化版: Patch 1+2 (必須) + Patch 3 (optional)

HAKMEM_TINY_METADATA_CACHE=1  (metadata cache ON)

条件:

Mixed (10-run) で測定
Learner disabled（learner ON 時は policy_hot disable）

決定基準:

GO: +1.0% 以上
NEUTRAL: ±1.0% → 研究箱維持（Patch 1+2 を default OFF のまま）
NO-GO: -1.0% 以下 → FREEZE（ENV gate で disable）

リスク評価

安全性チェック

リスク	対策
Policy hot cache が stale	learner disabled で OFF、bench putenv sync あり
First page cache invalid	Refill/retire で explicit invalidate
Bounds check miss	Macro hardcode で compile-out（型安全）
Lock depth	Free path なので不要

Rollback

HAKMEM_TINY_METADATA_CACHE=0 で即座に disable
ENV gate のみ（コード削除不要）

期待値の根拠

なぜ +5-10% か？

Policy hot cache: policy_snapshot() bypass → -2 memory ops
- 期待: +1-2%
First page inline: superslab lookup bypass → -1-2 memory ops
- 期待: +2-4%
Bounds check compile-out: modulo 演算の削減
- 期待: +0.5-1%

合計: +3.5-7% （保守的に見積もると +3-5%）

実装スケジュール（推定）

Patch 1 実装 (15-20分)
- Box ファイル作成
- Cache refresh ロジック
- ENV gate
Patch 2 実装 (20-30分)
- First page inline cache
- Free path 統合
Build & Test (10分)
- Compile 確認
- Sanity benchmark
A/B Test (15-20分)
- 10-run Mixed
- 統計分析
Commit & Summary (5分)
- GO/NO-GO 判定
- ドキュメント更新

Total: 約 65-85 分

非目標

ルーティングアルゴリズム変更（metadata cache は "hint" のみ）
Learner との相互作用修正（disabled 時のみ動作）
Cold path 最適化（C2 は hot path focused）

7.7 KiB Raw Blame History Unescape Escape

Phase 3 C2: Slab Metadata Cache Optimization 設計メモ

目的

結果（A/B）

観察と根拠

現状のメモリアクセスパターン（Free path）

改善対象

Box Theory（実装計画）

L0: Env（戻せる）

L1: MetadataCacheBox（境界: 2 箇所）

1.1 Policy Hot Cache

1.2 Slab First Page Inline

L2: Integration Points（2 箇所）

2.1 Free path（目標）

2.2 Policy hot cache refresh

実装指示（段階的パッチ）

Patch 1: Policy Hot Cache（小）

Patch 2: First Page Inline Cache（中）

Patch 3: Bounds Check Compile-out（小最適化）

A/B（GO/NO-GO）

Test Plan

リスク評価

安全性チェック

Rollback

期待値の根拠

実装スケジュール（推定）

非目標

7.7 KiB

Raw Blame History