Files
hakmem/docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_NEXT_INSTRUCTIONS.md
Moe Charm (CI) 0b306f72f4 Phase 14 kickoff: Pointer-chase reduction (tcache-style intrusive LIFO)
Design and implementation plan for Phase 14 v1:
- Target: Reduce pointer-chase overhead in TinyUnifiedCache
- Strategy: Add intrusive LIFO tcache layer before array-based cache
- Inspired by glibc tcache (per-bin head pointer, intrusive next)

Approach:
- L0: tiny_tcache_env_box (ENV gate: HAKMEM_TINY_TCACHE=0/1, default OFF)
- L1: tiny_tcache_box (intrusive LIFO: push/pop with cap=64)
- Integration: Inside unified_cache_push/pop (minimal call site changes)

Expected benefits:
- tcache hit: No array access, just head pointer + intrusive next
- Better locality (LIFO vs FIFO)
- Closer to system malloc tcache behavior

A/B plan:
- Test: HAKMEM_TINY_TCACHE=0/1 on Mixed 10-run
- GO threshold: +1.0% mean
- Rollback: ENV-gated, default OFF

Files added:
- docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_DESIGN.md
- docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_NEXT_INSTRUCTIONS.md

Next: Implement Phase 14 v1 patches (ENV box → tcache box → integration)

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-15 00:32:56 +09:00

2.8 KiB
Raw Blame History

Phase 14: Pointer-Chase Reduction v1 — 次の指示書Tiny tcache intrusive LIFO

0. Status

  • Phase 13 v1 / E5-2 で header write tax は NEUTRAL → 次の仮説へ
  • 次の芯: thread cache / pointer chasesystem malloc の tcache と構造差が濃厚)

設計: docs/analysis/PHASE14_POINTER_CHASE_REDUCTION_1_DESIGN.md


1. 目的GO 条件)

Mixed 10-runclean envで:

  • GO: mean +1.0% 以上
  • NO-GO: mean -1.0% 以下(即 rollback / freeze
  • NEUTRAL: ±1.0%research box freeze

2. 実装パッチ順(小さく積む)

Patch 1: L0 ENV Box戻せる + refresh

新規:

  • core/box/tiny_tcache_env_box.h
  • core/box/tiny_tcache_env_box.c

ENV:

  • HAKMEM_TINY_TCACHE=0/1default: 0
  • HAKMEM_TINY_TCACHE_CAP=64default: 64

API:

  • tiny_tcache_enabled()
  • tiny_tcache_cap()
  • tiny_tcache_env_refresh_from_env()

要件:

  • hot path に getenv() を置かないcached read のみ)

Patch 2: L1 tcache Boxintrusive LIFO

新規:

  • core/box/tiny_tcache_box.h

中身:

  • __thread の head/count を class 別に持つ8クラス固定
  • next pointer の読み書きは tiny_next_store/load を必須にする

API:

  • tiny_tcache_try_push(class_idx, base) -> bool
  • tiny_tcache_try_pop(class_idx) -> void*BASE or NULL

Patch 3: 統合点は unified_cache の内部call site を増やさない)

修正:

  • core/front/tiny_unified_cache.hunified_cache_push/pop の先頭に “1回だけ if”

方針:

  • tcache hit: 即 return配列に触らない
  • miss/overflow: 既存 array cache にフォールバック

Patch 4: bench_profile の refresh 同期

修正:

  • core/bench_profile.h

追加:

  • bench_setenv_default(...) 後に tiny_tcache_env_refresh_from_env() を呼ぶ

3. A/B テスト(同一バイナリ)

Baseline:

HAKMEM_TINY_TCACHE=0 scripts/run_mixed_10_cleanenv.sh

Optimized:

HAKMEM_TINY_TCACHE=1 scripts/run_mixed_10_cleanenv.sh

任意cap 探索は research:

HAKMEM_TINY_TCACHE=1 HAKMEM_TINY_TCACHE_CAP=32 scripts/run_mixed_10_cleanenv.sh
HAKMEM_TINY_TCACHE=1 HAKMEM_TINY_TCACHE_CAP=64 scripts/run_mixed_10_cleanenv.sh
HAKMEM_TINY_TCACHE=1 HAKMEM_TINY_TCACHE_CAP=128 scripts/run_mixed_10_cleanenv.sh

4. 可視化(最小)

必要なら tcache hit/miss を TLS カウンタで持つatomic 禁止)。 “勝ち筋確認が必要なときだけ” fprintf(stderr, ...) でワンショット dump常時ログ禁止


5. 昇格GO の場合のみ)

GO のとき:

  1. preset へ追加(まずは MIXED_TINYV3_C7_SAFE のみ)
  2. CURRENT_TASK.md に A/B を記録
  3. rollback 手順:
    • export HAKMEM_TINY_TCACHE=0

NO-GO/NEUTRAL のとき:

  • research box freezedefault OFF のまま保持)