Files
hakmem/docs/specs/ENV_VARS.md

636 lines
34 KiB
Markdown
Raw Normal View History

HAKMEM Environment Variables (Tiny focus)
WIP: Add TLS SLL validation and SuperSlab registry fallback ChatGPT's diagnostic changes to address TLS_SLL_HDR_RESET issue. Current status: Partial mitigation, but root cause remains. Changes Applied: 1. SuperSlab Registry Fallback (hakmem_super_registry.h) - Added legacy table probe when hash map lookup misses - Prevents NULL returns for valid SuperSlabs during initialization - Status: ✅ Works but may hide underlying registration issues 2. TLS SLL Push Validation (tls_sll_box.h) - Reject push if SuperSlab lookup returns NULL - Reject push if class_idx mismatch detected - Added [TLS_SLL_PUSH_NO_SS] diagnostic message - Status: ✅ Prevents list corruption (defensive) 3. SuperSlab Allocation Class Fix (superslab_allocate.c) - Pass actual class_idx to sp_internal_allocate_superslab - Prevents dummy class=8 causing OOB access - Status: ✅ Root cause fix for allocation path 4. Debug Output Additions - First 256 push/pop operations traced - First 4 mismatches logged with details - SuperSlab registration state logged - Status: ✅ Diagnostic tool (not a fix) 5. TLS Hint Box Removed - Deleted ss_tls_hint_box.{c,h} (Phase 1 optimization) - Simplified to focus on stability first - Status: ⏳ Can be re-added after root cause fixed Current Problem (REMAINS UNSOLVED): - [TLS_SLL_HDR_RESET] still occurs after ~60 seconds of sh8bench - Pointer is 16 bytes offset from expected (class 1 → class 2 boundary) - hak_super_lookup returns NULL for that pointer - Suggests: Use-After-Free, Double-Free, or pointer arithmetic error Root Cause Analysis: - Pattern: Pointer offset by +16 (one class 1 stride) - Timing: Cumulative problem (appears after 60s, not immediately) - Location: Header corruption detected during TLS SLL pop Remaining Issues: ⚠️ Registry fallback is defensive (may hide registration bugs) ⚠️ Push validation prevents symptoms but not root cause ⚠️ 16-byte pointer offset source unidentified Next Steps for Investigation: 1. Full pointer arithmetic audit (Magazine ⇔ TLS SLL paths) 2. Enhanced logging at HDR_RESET point: - Expected vs actual pointer value - Pointer provenance (where it came from) - Allocation trace for that block 3. Verify Headerless flag is OFF throughout build 4. Check for double-offset application in conversions Technical Assessment: - 60% root cause fixes (allocation class, validation) - 40% defensive mitigation (registry fallback, push rejection) Performance Impact: - Registry fallback: +10-30 cycles on cold path (negligible) - Push validation: +5-10 cycles per push (acceptable) - Overall: < 2% performance impact estimated Related Issues: - Phase 1 TLS Hint Box removed temporarily - Phase 2 Headerless blocked until stability achieved 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 20:42:28 +09:00
このファイルは HAKMEM の Tiny 系機能Tiny allocator / TLS SLL / SuperSlabに関わる環境変数をまとめたものです。
数が多いため、まず「よく使うもの」だけをざっと把握できるようにし、その後に詳細なカテゴリ別リストを載せています。
### このファイルと他ドキュメントの関係(現状整理)
- リポジトリ全体の `getenv()` 呼び出しを集計した結果は `ENV_VARIABLE_SURVEY.md` にまとまっています(変数 228 個)。
- `ENV_VARIABLE_SURVEY.md` では各変数に対して **KEEP / CONSOLIDATE / DEPRECATE** のステータスを付けています。
- KEEP: 本番運用・安全性・主要なチューニングで今後も使う前提の変数
- CONSOLIDATE: 将来的に `HAKMEM_DEBUG` / `HAKMEM_TRACE` / `HAKMEM_STATS` 等のマスター系に統合する予定の変数
- DEPRECATE: 段階的に削除予定の変数(新規利用は非推奨)
- この `ENV_VARS.md` はその中から、**Tiny / SuperSlab / TLS SLL 周りで日常的に触るべきものを中心に抜き出した実用リファレンス** です。
- Tiny 以外を含む網羅的な説明や、個々の変数のステータス詳細が必要な場合は:
- `ENV_VARIABLE_SURVEY.md`(最新のサーベイとステータス)
- `docs/specs/ENV_VARS_COMPLETE.md`(より古いが網羅的なリファレンス)
をあわせて参照してください。
## Quick cheat sheetよく使うENV
### コア動作トグル(常用)
- `HAKMEM_WRAP_TINY`
Tiny allocator を有効化(直リンクまたは LD_PRELOAD 時の主経路にする)。
- `HAKMEM_TINY_USE_SUPERSLAB`
Tiny が SuperSlab バックエンドを使うかどうか(既定 ON
- `HAKMEM_TINY_TLS_SLL`
Tiny TLS SLL を有効化Headerless/Phase2 でもコア機能)。
- `HAKMEM_SAFE_FREE` / `HAKMEM_INVALID_FREE` / `HAKMEM_INVALID_FREE_LOG`
free() 経路の安全性・invalid free 検出モード。
- `HAKMEM_LD_SAFE` / `HAKMEM_LD_BLOCK_JEMALLOC` / `HAKMEM_FORCE_LIBC_ALLOC(_INIT)`
LD_PRELOAD 時の安全モードと jemalloc との共存制御。
### Tiny/SuperSlab/TLS の主要チューニング
- `HAKMEM_TINY_REFILL_MAX` / `HAKMEM_TINY_REFILL_MAX_HOT` / `HAKMEM_TINY_REFILL_MAX_C{0..7}`
Tiny TLS キャッシュのリフィル上限(全体/ホットクラス/クラス別)。
- `HAKMEM_TINY_SS_ADOPT*` / `HAKMEM_TINY_SS_REQTRACE`
SuperSlab publish/adopt 経路と採用ゲートの挙動。
- `HAKMEM_TINY_SLL_DRAIN_ENABLE` / `HAKMEM_TINY_SLL_DRAIN_INTERVAL`
TLS SLL → freelist drain の頻度・有効/無効。
### ベンチ/実験用(通常運用では OFF 推奨)
- `HAKMEM_BENCH_FAST_FRONT` / `HAKMEM_BENCH_WARMUP` / `HAKMEM_TINY_BENCH_*`
ベンチマーク専用の fast front / warmup / refill 設定。
- `HAKMEM_TINY_ULTRA*` / `HAKMEM_TINY_BENCH_SLL_ONLY`
Ultra Tiny / SLLonly などの実験経路。
### デバッグ/トレース/統計(マスター系)
- `HAKMEM_DEBUG_ALL` / `HAKMEM_DEBUG_LEVEL` / `HAKMEM_QUIET`
全体デバッグの一括 ON/OFF と冗長度。
- `HAKMEM_TRACE` / `HAKMEM_TRACE_LEVEL`
トレース対象ptr/refill/free/mailbox/...)と冗長度。
- `HAKMEM_STATS` / `HAKMEM_STATS_DUMP`
統計モジュールの有効化と終了時ダンプ。
以降のセクションでは、これらを含むすべての ENV をカテゴリ別に詳しく説明します。
---
## Core toggles
- HAKMEM_WRAP_TINY=1
- Tiny allocatorを有効化直リンク
- HAKMEM_TINY_USE_SUPERSLAB=0/1
- SuperSlab経路のON/OFF既定ON
WIP: Add TLS SLL validation and SuperSlab registry fallback ChatGPT's diagnostic changes to address TLS_SLL_HDR_RESET issue. Current status: Partial mitigation, but root cause remains. Changes Applied: 1. SuperSlab Registry Fallback (hakmem_super_registry.h) - Added legacy table probe when hash map lookup misses - Prevents NULL returns for valid SuperSlabs during initialization - Status: ✅ Works but may hide underlying registration issues 2. TLS SLL Push Validation (tls_sll_box.h) - Reject push if SuperSlab lookup returns NULL - Reject push if class_idx mismatch detected - Added [TLS_SLL_PUSH_NO_SS] diagnostic message - Status: ✅ Prevents list corruption (defensive) 3. SuperSlab Allocation Class Fix (superslab_allocate.c) - Pass actual class_idx to sp_internal_allocate_superslab - Prevents dummy class=8 causing OOB access - Status: ✅ Root cause fix for allocation path 4. Debug Output Additions - First 256 push/pop operations traced - First 4 mismatches logged with details - SuperSlab registration state logged - Status: ✅ Diagnostic tool (not a fix) 5. TLS Hint Box Removed - Deleted ss_tls_hint_box.{c,h} (Phase 1 optimization) - Simplified to focus on stability first - Status: ⏳ Can be re-added after root cause fixed Current Problem (REMAINS UNSOLVED): - [TLS_SLL_HDR_RESET] still occurs after ~60 seconds of sh8bench - Pointer is 16 bytes offset from expected (class 1 → class 2 boundary) - hak_super_lookup returns NULL for that pointer - Suggests: Use-After-Free, Double-Free, or pointer arithmetic error Root Cause Analysis: - Pattern: Pointer offset by +16 (one class 1 stride) - Timing: Cumulative problem (appears after 60s, not immediately) - Location: Header corruption detected during TLS SLL pop Remaining Issues: ⚠️ Registry fallback is defensive (may hide registration bugs) ⚠️ Push validation prevents symptoms but not root cause ⚠️ 16-byte pointer offset source unidentified Next Steps for Investigation: 1. Full pointer arithmetic audit (Magazine ⇔ TLS SLL paths) 2. Enhanced logging at HDR_RESET point: - Expected vs actual pointer value - Pointer provenance (where it came from) - Allocation trace for that block 3. Verify Headerless flag is OFF throughout build 4. Check for double-offset application in conversions Technical Assessment: - 60% root cause fixes (allocation class, validation) - 40% defensive mitigation (registry fallback, push rejection) Performance Impact: - Registry fallback: +10-30 cycles on cold path (negligible) - Push validation: +5-10 cycles per push (acceptable) - Overall: < 2% performance impact estimated Related Issues: - Phase 1 TLS Hint Box removed temporarily - Phase 2 Headerless blocked until stability achieved 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 20:42:28 +09:00
## SFC (Super Front Cache) stats / A/B
- HAKMEM_SFC_ENABLE=0/1
- Box 5NEW: Super Front Cache を有効化既定OFF; A/B用
- HAKMEM_SFC_CAPACITY=16..256 / HAKMEM_SFC_REFILL_COUNT=8..256
- SFCの容量とリフィル個数例: 256/128
- HAKMEM_SFC_STATS_DUMP=1
- プロセス終了時に SFC 統計をstderrへダンプalloc_hits/misses, refill_calls など)。
- 使い方: make CFLAGS+=" -DHAKMEM_DEBUG_COUNTERS=1" larson_hakmem; HAKMEM_SFC_ENABLE=1 HAKMEM_SFC_STATS_DUMP=1 ./larson_hakmem …
WIP: Add TLS SLL validation and SuperSlab registry fallback ChatGPT's diagnostic changes to address TLS_SLL_HDR_RESET issue. Current status: Partial mitigation, but root cause remains. Changes Applied: 1. SuperSlab Registry Fallback (hakmem_super_registry.h) - Added legacy table probe when hash map lookup misses - Prevents NULL returns for valid SuperSlabs during initialization - Status: ✅ Works but may hide underlying registration issues 2. TLS SLL Push Validation (tls_sll_box.h) - Reject push if SuperSlab lookup returns NULL - Reject push if class_idx mismatch detected - Added [TLS_SLL_PUSH_NO_SS] diagnostic message - Status: ✅ Prevents list corruption (defensive) 3. SuperSlab Allocation Class Fix (superslab_allocate.c) - Pass actual class_idx to sp_internal_allocate_superslab - Prevents dummy class=8 causing OOB access - Status: ✅ Root cause fix for allocation path 4. Debug Output Additions - First 256 push/pop operations traced - First 4 mismatches logged with details - SuperSlab registration state logged - Status: ✅ Diagnostic tool (not a fix) 5. TLS Hint Box Removed - Deleted ss_tls_hint_box.{c,h} (Phase 1 optimization) - Simplified to focus on stability first - Status: ⏳ Can be re-added after root cause fixed Current Problem (REMAINS UNSOLVED): - [TLS_SLL_HDR_RESET] still occurs after ~60 seconds of sh8bench - Pointer is 16 bytes offset from expected (class 1 → class 2 boundary) - hak_super_lookup returns NULL for that pointer - Suggests: Use-After-Free, Double-Free, or pointer arithmetic error Root Cause Analysis: - Pattern: Pointer offset by +16 (one class 1 stride) - Timing: Cumulative problem (appears after 60s, not immediately) - Location: Header corruption detected during TLS SLL pop Remaining Issues: ⚠️ Registry fallback is defensive (may hide registration bugs) ⚠️ Push validation prevents symptoms but not root cause ⚠️ 16-byte pointer offset source unidentified Next Steps for Investigation: 1. Full pointer arithmetic audit (Magazine ⇔ TLS SLL paths) 2. Enhanced logging at HDR_RESET point: - Expected vs actual pointer value - Pointer provenance (where it came from) - Allocation trace for that block 3. Verify Headerless flag is OFF throughout build 4. Check for double-offset application in conversions Technical Assessment: - 60% root cause fixes (allocation class, validation) - 40% defensive mitigation (registry fallback, push rejection) Performance Impact: - Registry fallback: +10-30 cycles on cold path (negligible) - Push validation: +5-10 cycles per push (acceptable) - Overall: < 2% performance impact estimated Related Issues: - Phase 1 TLS Hint Box removed temporarily - Phase 2 Headerless blocked until stability achieved 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 20:42:28 +09:00
## Larson defaults (publish→mail→adopt)
- 忘れがちな必須変数をスクリプトで一括設定するため、`scripts/run_larson_defaults.sh` を用意しています。
- 既定で以下を export しますA/B は環境変数で上書き可能):
- `HAKMEM_TINY_USE_SUPERSLAB=1` / `HAKMEM_TINY_MUST_ADOPT=1` / `HAKMEM_TINY_SS_ADOPT=1`
- `HAKMEM_TINY_FAST_CAP=64`
- `HAKMEM_TINY_FAST_SPARE_PERIOD=8` ← fast-tier から Superslab へ戻して publish 起点を作る
- `HAKMEM_TINY_MAILBOX_SLOWDISC=1`
- `HAKMEM_TINY_MAILBOX_SLOWDISC_PERIOD=256`
WIP: Add TLS SLL validation and SuperSlab registry fallback ChatGPT's diagnostic changes to address TLS_SLL_HDR_RESET issue. Current status: Partial mitigation, but root cause remains. Changes Applied: 1. SuperSlab Registry Fallback (hakmem_super_registry.h) - Added legacy table probe when hash map lookup misses - Prevents NULL returns for valid SuperSlabs during initialization - Status: ✅ Works but may hide underlying registration issues 2. TLS SLL Push Validation (tls_sll_box.h) - Reject push if SuperSlab lookup returns NULL - Reject push if class_idx mismatch detected - Added [TLS_SLL_PUSH_NO_SS] diagnostic message - Status: ✅ Prevents list corruption (defensive) 3. SuperSlab Allocation Class Fix (superslab_allocate.c) - Pass actual class_idx to sp_internal_allocate_superslab - Prevents dummy class=8 causing OOB access - Status: ✅ Root cause fix for allocation path 4. Debug Output Additions - First 256 push/pop operations traced - First 4 mismatches logged with details - SuperSlab registration state logged - Status: ✅ Diagnostic tool (not a fix) 5. TLS Hint Box Removed - Deleted ss_tls_hint_box.{c,h} (Phase 1 optimization) - Simplified to focus on stability first - Status: ⏳ Can be re-added after root cause fixed Current Problem (REMAINS UNSOLVED): - [TLS_SLL_HDR_RESET] still occurs after ~60 seconds of sh8bench - Pointer is 16 bytes offset from expected (class 1 → class 2 boundary) - hak_super_lookup returns NULL for that pointer - Suggests: Use-After-Free, Double-Free, or pointer arithmetic error Root Cause Analysis: - Pattern: Pointer offset by +16 (one class 1 stride) - Timing: Cumulative problem (appears after 60s, not immediately) - Location: Header corruption detected during TLS SLL pop Remaining Issues: ⚠️ Registry fallback is defensive (may hide registration bugs) ⚠️ Push validation prevents symptoms but not root cause ⚠️ 16-byte pointer offset source unidentified Next Steps for Investigation: 1. Full pointer arithmetic audit (Magazine ⇔ TLS SLL paths) 2. Enhanced logging at HDR_RESET point: - Expected vs actual pointer value - Pointer provenance (where it came from) - Allocation trace for that block 3. Verify Headerless flag is OFF throughout build 4. Check for double-offset application in conversions Technical Assessment: - 60% root cause fixes (allocation class, validation) - 40% defensive mitigation (registry fallback, push rejection) Performance Impact: - Registry fallback: +10-30 cycles on cold path (negligible) - Push validation: +5-10 cycles per push (acceptable) - Overall: < 2% performance impact estimated Related Issues: - Phase 1 TLS Hint Box removed temporarily - Phase 2 Headerless blocked until stability achieved 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 20:42:28 +09:00
## Front Gate (A/B for boxified fast path)
- `HAKMEM_TINY_FRONT_GATE_BOX=1` — Use Front Gate Box implementation (SFC→SLL) for fast-path pop/push/cascade. Default 0. Safe to toggle during builds via `make EXTRA_CFLAGS+=" -DHAKMEM_TINY_FRONT_GATE_BOX=1"`.
- Debug visibility任意: `HAKMEM_TINY_RF_TRACE=1`
- Force-notify任意, デバッグ補助): `HAKMEM_TINY_RF_FORCE_NOTIFY=1`
- モード別tput/pfで Superslab サイズと cache/precharge も設定:
- tput: `HAKMEM_TINY_SS_FORCE_LG=21`, `HAKMEM_TINY_SS_CACHE=0`, `HAKMEM_TINY_SS_PRECHARGE=0`
- pf: `HAKMEM_TINY_SS_FORCE_LG=20`, `HAKMEM_TINY_SS_CACHE=4`, `HAKMEM_TINY_SS_PRECHARGE=1`
WIP: Add TLS SLL validation and SuperSlab registry fallback ChatGPT's diagnostic changes to address TLS_SLL_HDR_RESET issue. Current status: Partial mitigation, but root cause remains. Changes Applied: 1. SuperSlab Registry Fallback (hakmem_super_registry.h) - Added legacy table probe when hash map lookup misses - Prevents NULL returns for valid SuperSlabs during initialization - Status: ✅ Works but may hide underlying registration issues 2. TLS SLL Push Validation (tls_sll_box.h) - Reject push if SuperSlab lookup returns NULL - Reject push if class_idx mismatch detected - Added [TLS_SLL_PUSH_NO_SS] diagnostic message - Status: ✅ Prevents list corruption (defensive) 3. SuperSlab Allocation Class Fix (superslab_allocate.c) - Pass actual class_idx to sp_internal_allocate_superslab - Prevents dummy class=8 causing OOB access - Status: ✅ Root cause fix for allocation path 4. Debug Output Additions - First 256 push/pop operations traced - First 4 mismatches logged with details - SuperSlab registration state logged - Status: ✅ Diagnostic tool (not a fix) 5. TLS Hint Box Removed - Deleted ss_tls_hint_box.{c,h} (Phase 1 optimization) - Simplified to focus on stability first - Status: ⏳ Can be re-added after root cause fixed Current Problem (REMAINS UNSOLVED): - [TLS_SLL_HDR_RESET] still occurs after ~60 seconds of sh8bench - Pointer is 16 bytes offset from expected (class 1 → class 2 boundary) - hak_super_lookup returns NULL for that pointer - Suggests: Use-After-Free, Double-Free, or pointer arithmetic error Root Cause Analysis: - Pattern: Pointer offset by +16 (one class 1 stride) - Timing: Cumulative problem (appears after 60s, not immediately) - Location: Header corruption detected during TLS SLL pop Remaining Issues: ⚠️ Registry fallback is defensive (may hide registration bugs) ⚠️ Push validation prevents symptoms but not root cause ⚠️ 16-byte pointer offset source unidentified Next Steps for Investigation: 1. Full pointer arithmetic audit (Magazine ⇔ TLS SLL paths) 2. Enhanced logging at HDR_RESET point: - Expected vs actual pointer value - Pointer provenance (where it came from) - Allocation trace for that block 3. Verify Headerless flag is OFF throughout build 4. Check for double-offset application in conversions Technical Assessment: - 60% root cause fixes (allocation class, validation) - 40% defensive mitigation (registry fallback, push rejection) Performance Impact: - Registry fallback: +10-30 cycles on cold path (negligible) - Push validation: +5-10 cycles per push (acceptable) - Overall: < 2% performance impact estimated Related Issues: - Phase 1 TLS Hint Box removed temporarily - Phase 2 Headerless blocked until stability achieved 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 20:42:28 +09:00
## Ultra Tiny (SLL-only, experimental)
- HAKMEM_TINY_ULTRA=0/1
- Ultra TinyモードのON/OFFSLL中心の最小ホットパス
- HAKMEM_TINY_ULTRA_VALIDATE=0/1
- UltraのSLLヘッド検証安全性重視時に1、性能計測は0推奨
- HAKMEM_TINY_ULTRA_BATCH_C{0..7}=N
- クラス別リフィル・バッチ上書き(例: class=3(64B) → C3
- HAKMEM_TINY_ULTRA_SLL_CAP_C{0..7}=N
- クラス別SLL上限上書き
WIP: Add TLS SLL validation and SuperSlab registry fallback ChatGPT's diagnostic changes to address TLS_SLL_HDR_RESET issue. Current status: Partial mitigation, but root cause remains. Changes Applied: 1. SuperSlab Registry Fallback (hakmem_super_registry.h) - Added legacy table probe when hash map lookup misses - Prevents NULL returns for valid SuperSlabs during initialization - Status: ✅ Works but may hide underlying registration issues 2. TLS SLL Push Validation (tls_sll_box.h) - Reject push if SuperSlab lookup returns NULL - Reject push if class_idx mismatch detected - Added [TLS_SLL_PUSH_NO_SS] diagnostic message - Status: ✅ Prevents list corruption (defensive) 3. SuperSlab Allocation Class Fix (superslab_allocate.c) - Pass actual class_idx to sp_internal_allocate_superslab - Prevents dummy class=8 causing OOB access - Status: ✅ Root cause fix for allocation path 4. Debug Output Additions - First 256 push/pop operations traced - First 4 mismatches logged with details - SuperSlab registration state logged - Status: ✅ Diagnostic tool (not a fix) 5. TLS Hint Box Removed - Deleted ss_tls_hint_box.{c,h} (Phase 1 optimization) - Simplified to focus on stability first - Status: ⏳ Can be re-added after root cause fixed Current Problem (REMAINS UNSOLVED): - [TLS_SLL_HDR_RESET] still occurs after ~60 seconds of sh8bench - Pointer is 16 bytes offset from expected (class 1 → class 2 boundary) - hak_super_lookup returns NULL for that pointer - Suggests: Use-After-Free, Double-Free, or pointer arithmetic error Root Cause Analysis: - Pattern: Pointer offset by +16 (one class 1 stride) - Timing: Cumulative problem (appears after 60s, not immediately) - Location: Header corruption detected during TLS SLL pop Remaining Issues: ⚠️ Registry fallback is defensive (may hide registration bugs) ⚠️ Push validation prevents symptoms but not root cause ⚠️ 16-byte pointer offset source unidentified Next Steps for Investigation: 1. Full pointer arithmetic audit (Magazine ⇔ TLS SLL paths) 2. Enhanced logging at HDR_RESET point: - Expected vs actual pointer value - Pointer provenance (where it came from) - Allocation trace for that block 3. Verify Headerless flag is OFF throughout build 4. Check for double-offset application in conversions Technical Assessment: - 60% root cause fixes (allocation class, validation) - 40% defensive mitigation (registry fallback, push rejection) Performance Impact: - Registry fallback: +10-30 cycles on cold path (negligible) - Push validation: +5-10 cycles per push (acceptable) - Overall: < 2% performance impact estimated Related Issues: - Phase 1 TLS Hint Box removed temporarily - Phase 2 Headerless blocked until stability achieved 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 20:42:28 +09:00
## SuperSlab adopt/publish実験
- HAKMEM_TINY_SS_ADOPT=0/1
- SuperSlab の publish/adopt + remote drain + owner移譲を有効化既定OFF
- 4T Larson など cross-thread free が多いワークロードで再利用密度を高めるための実験用スイッチ。
- ON 時は一部の単体性能1Tが低下する可能性があるため A/B 前提で使用してください。
- 備考: 環境変数を未設定の場合でも、実行中に cross-thread free が検出されると自動で ON になるauto-on
- HAKMEM_TINY_SS_ADOPT_COOLDOWN=4
- adopt 再試行までのクールダウンスレッド毎。0=無効。
- HAKMEM_TINY_SS_ADOPT_BUDGET=8
- superslab_refill() 内で adopt を試行する最大回数0-32
- HAKMEM_TINY_SS_ADOPT_BUDGET_C{0..7}
- クラス別の adopt 予算個別上書き0-32。指定時は `HAKMEM_TINY_SS_ADOPT_BUDGET` より優先。
- HAKMEM_TINY_SS_REQTRACE=1
- 収穫ゲートguardや ENOMEM フォールバック、slab/SS 採用のリクエストトレースを標準エラーに出力(軽量)。
- HAKMEM_TINY_RF_FORCE_NOTIFY=0/1デバッグ補助
- remote queue がすでに非空old!=0でも、`slab_listed==0` の場合に publish を強制通知。
- 初回の空→非空通知を見逃した可能性をあぶり出す用途に有効A/B 推奨)。
WIP: Add TLS SLL validation and SuperSlab registry fallback ChatGPT's diagnostic changes to address TLS_SLL_HDR_RESET issue. Current status: Partial mitigation, but root cause remains. Changes Applied: 1. SuperSlab Registry Fallback (hakmem_super_registry.h) - Added legacy table probe when hash map lookup misses - Prevents NULL returns for valid SuperSlabs during initialization - Status: ✅ Works but may hide underlying registration issues 2. TLS SLL Push Validation (tls_sll_box.h) - Reject push if SuperSlab lookup returns NULL - Reject push if class_idx mismatch detected - Added [TLS_SLL_PUSH_NO_SS] diagnostic message - Status: ✅ Prevents list corruption (defensive) 3. SuperSlab Allocation Class Fix (superslab_allocate.c) - Pass actual class_idx to sp_internal_allocate_superslab - Prevents dummy class=8 causing OOB access - Status: ✅ Root cause fix for allocation path 4. Debug Output Additions - First 256 push/pop operations traced - First 4 mismatches logged with details - SuperSlab registration state logged - Status: ✅ Diagnostic tool (not a fix) 5. TLS Hint Box Removed - Deleted ss_tls_hint_box.{c,h} (Phase 1 optimization) - Simplified to focus on stability first - Status: ⏳ Can be re-added after root cause fixed Current Problem (REMAINS UNSOLVED): - [TLS_SLL_HDR_RESET] still occurs after ~60 seconds of sh8bench - Pointer is 16 bytes offset from expected (class 1 → class 2 boundary) - hak_super_lookup returns NULL for that pointer - Suggests: Use-After-Free, Double-Free, or pointer arithmetic error Root Cause Analysis: - Pattern: Pointer offset by +16 (one class 1 stride) - Timing: Cumulative problem (appears after 60s, not immediately) - Location: Header corruption detected during TLS SLL pop Remaining Issues: ⚠️ Registry fallback is defensive (may hide registration bugs) ⚠️ Push validation prevents symptoms but not root cause ⚠️ 16-byte pointer offset source unidentified Next Steps for Investigation: 1. Full pointer arithmetic audit (Magazine ⇔ TLS SLL paths) 2. Enhanced logging at HDR_RESET point: - Expected vs actual pointer value - Pointer provenance (where it came from) - Allocation trace for that block 3. Verify Headerless flag is OFF throughout build 4. Check for double-offset application in conversions Technical Assessment: - 60% root cause fixes (allocation class, validation) - 40% defensive mitigation (registry fallback, push rejection) Performance Impact: - Registry fallback: +10-30 cycles on cold path (negligible) - Push validation: +5-10 cycles per push (acceptable) - Overall: < 2% performance impact estimated Related Issues: - Phase 1 TLS Hint Box removed temporarily - Phase 2 Headerless blocked until stability achieved 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 20:42:28 +09:00
## Ready ListRefill最適化の箱
- 2025-12 cleanup: Ready系ENVは廃止。Ready ringは常時有効、幅/予算は固定width=TINY_READY_RING, budget=1
WIP: Add TLS SLL validation and SuperSlab registry fallback ChatGPT's diagnostic changes to address TLS_SLL_HDR_RESET issue. Current status: Partial mitigation, but root cause remains. Changes Applied: 1. SuperSlab Registry Fallback (hakmem_super_registry.h) - Added legacy table probe when hash map lookup misses - Prevents NULL returns for valid SuperSlabs during initialization - Status: ✅ Works but may hide underlying registration issues 2. TLS SLL Push Validation (tls_sll_box.h) - Reject push if SuperSlab lookup returns NULL - Reject push if class_idx mismatch detected - Added [TLS_SLL_PUSH_NO_SS] diagnostic message - Status: ✅ Prevents list corruption (defensive) 3. SuperSlab Allocation Class Fix (superslab_allocate.c) - Pass actual class_idx to sp_internal_allocate_superslab - Prevents dummy class=8 causing OOB access - Status: ✅ Root cause fix for allocation path 4. Debug Output Additions - First 256 push/pop operations traced - First 4 mismatches logged with details - SuperSlab registration state logged - Status: ✅ Diagnostic tool (not a fix) 5. TLS Hint Box Removed - Deleted ss_tls_hint_box.{c,h} (Phase 1 optimization) - Simplified to focus on stability first - Status: ⏳ Can be re-added after root cause fixed Current Problem (REMAINS UNSOLVED): - [TLS_SLL_HDR_RESET] still occurs after ~60 seconds of sh8bench - Pointer is 16 bytes offset from expected (class 1 → class 2 boundary) - hak_super_lookup returns NULL for that pointer - Suggests: Use-After-Free, Double-Free, or pointer arithmetic error Root Cause Analysis: - Pattern: Pointer offset by +16 (one class 1 stride) - Timing: Cumulative problem (appears after 60s, not immediately) - Location: Header corruption detected during TLS SLL pop Remaining Issues: ⚠️ Registry fallback is defensive (may hide registration bugs) ⚠️ Push validation prevents symptoms but not root cause ⚠️ 16-byte pointer offset source unidentified Next Steps for Investigation: 1. Full pointer arithmetic audit (Magazine ⇔ TLS SLL paths) 2. Enhanced logging at HDR_RESET point: - Expected vs actual pointer value - Pointer provenance (where it came from) - Allocation trace for that block 3. Verify Headerless flag is OFF throughout build 4. Check for double-offset application in conversions Technical Assessment: - 60% root cause fixes (allocation class, validation) - 40% defensive mitigation (registry fallback, push rejection) Performance Impact: - Registry fallback: +10-30 cycles on cold path (negligible) - Push validation: +5-10 cycles per push (acceptable) - Overall: < 2% performance impact estimated Related Issues: - Phase 1 TLS Hint Box removed temporarily - Phase 2 Headerless blocked until stability achieved 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 20:42:28 +09:00
## Background Remote Drain束ね箱・軽量ステップ
- 2025-12 cleanup: BG Remote系ENVHAKMEM_TINY_BG_REMOTE*は廃止。BGリモート/aggregatorは固定OFF。
WIP: Add TLS SLL validation and SuperSlab registry fallback ChatGPT's diagnostic changes to address TLS_SLL_HDR_RESET issue. Current status: Partial mitigation, but root cause remains. Changes Applied: 1. SuperSlab Registry Fallback (hakmem_super_registry.h) - Added legacy table probe when hash map lookup misses - Prevents NULL returns for valid SuperSlabs during initialization - Status: ✅ Works but may hide underlying registration issues 2. TLS SLL Push Validation (tls_sll_box.h) - Reject push if SuperSlab lookup returns NULL - Reject push if class_idx mismatch detected - Added [TLS_SLL_PUSH_NO_SS] diagnostic message - Status: ✅ Prevents list corruption (defensive) 3. SuperSlab Allocation Class Fix (superslab_allocate.c) - Pass actual class_idx to sp_internal_allocate_superslab - Prevents dummy class=8 causing OOB access - Status: ✅ Root cause fix for allocation path 4. Debug Output Additions - First 256 push/pop operations traced - First 4 mismatches logged with details - SuperSlab registration state logged - Status: ✅ Diagnostic tool (not a fix) 5. TLS Hint Box Removed - Deleted ss_tls_hint_box.{c,h} (Phase 1 optimization) - Simplified to focus on stability first - Status: ⏳ Can be re-added after root cause fixed Current Problem (REMAINS UNSOLVED): - [TLS_SLL_HDR_RESET] still occurs after ~60 seconds of sh8bench - Pointer is 16 bytes offset from expected (class 1 → class 2 boundary) - hak_super_lookup returns NULL for that pointer - Suggests: Use-After-Free, Double-Free, or pointer arithmetic error Root Cause Analysis: - Pattern: Pointer offset by +16 (one class 1 stride) - Timing: Cumulative problem (appears after 60s, not immediately) - Location: Header corruption detected during TLS SLL pop Remaining Issues: ⚠️ Registry fallback is defensive (may hide registration bugs) ⚠️ Push validation prevents symptoms but not root cause ⚠️ 16-byte pointer offset source unidentified Next Steps for Investigation: 1. Full pointer arithmetic audit (Magazine ⇔ TLS SLL paths) 2. Enhanced logging at HDR_RESET point: - Expected vs actual pointer value - Pointer provenance (where it came from) - Allocation trace for that block 3. Verify Headerless flag is OFF throughout build 4. Check for double-offset application in conversions Technical Assessment: - 60% root cause fixes (allocation class, validation) - 40% defensive mitigation (registry fallback, push rejection) Performance Impact: - Registry fallback: +10-30 cycles on cold path (negligible) - Push validation: +5-10 cycles per push (acceptable) - Overall: < 2% performance impact estimated Related Issues: - Phase 1 TLS Hint Box removed temporarily - Phase 2 Headerless blocked until stability achieved 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 20:42:28 +09:00
## Ready AggregatorBG, 非破壊peek
- 2025-12 cleanup: Ready Aggregator系ENVも廃止固定OFF
WIP: Add TLS SLL validation and SuperSlab registry fallback ChatGPT's diagnostic changes to address TLS_SLL_HDR_RESET issue. Current status: Partial mitigation, but root cause remains. Changes Applied: 1. SuperSlab Registry Fallback (hakmem_super_registry.h) - Added legacy table probe when hash map lookup misses - Prevents NULL returns for valid SuperSlabs during initialization - Status: ✅ Works but may hide underlying registration issues 2. TLS SLL Push Validation (tls_sll_box.h) - Reject push if SuperSlab lookup returns NULL - Reject push if class_idx mismatch detected - Added [TLS_SLL_PUSH_NO_SS] diagnostic message - Status: ✅ Prevents list corruption (defensive) 3. SuperSlab Allocation Class Fix (superslab_allocate.c) - Pass actual class_idx to sp_internal_allocate_superslab - Prevents dummy class=8 causing OOB access - Status: ✅ Root cause fix for allocation path 4. Debug Output Additions - First 256 push/pop operations traced - First 4 mismatches logged with details - SuperSlab registration state logged - Status: ✅ Diagnostic tool (not a fix) 5. TLS Hint Box Removed - Deleted ss_tls_hint_box.{c,h} (Phase 1 optimization) - Simplified to focus on stability first - Status: ⏳ Can be re-added after root cause fixed Current Problem (REMAINS UNSOLVED): - [TLS_SLL_HDR_RESET] still occurs after ~60 seconds of sh8bench - Pointer is 16 bytes offset from expected (class 1 → class 2 boundary) - hak_super_lookup returns NULL for that pointer - Suggests: Use-After-Free, Double-Free, or pointer arithmetic error Root Cause Analysis: - Pattern: Pointer offset by +16 (one class 1 stride) - Timing: Cumulative problem (appears after 60s, not immediately) - Location: Header corruption detected during TLS SLL pop Remaining Issues: ⚠️ Registry fallback is defensive (may hide registration bugs) ⚠️ Push validation prevents symptoms but not root cause ⚠️ 16-byte pointer offset source unidentified Next Steps for Investigation: 1. Full pointer arithmetic audit (Magazine ⇔ TLS SLL paths) 2. Enhanced logging at HDR_RESET point: - Expected vs actual pointer value - Pointer provenance (where it came from) - Allocation trace for that block 3. Verify Headerless flag is OFF throughout build 4. Check for double-offset application in conversions Technical Assessment: - 60% root cause fixes (allocation class, validation) - 40% defensive mitigation (registry fallback, push rejection) Performance Impact: - Registry fallback: +10-30 cycles on cold path (negligible) - Push validation: +5-10 cycles per push (acceptable) - Overall: < 2% performance impact estimated Related Issues: - Phase 1 TLS Hint Box removed temporarily - Phase 2 Headerless blocked until stability achieved 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 20:42:28 +09:00
## Registry 窓探索コストのA/B
- HAKMEM_TINY_REG_SCAN_MAX=N
- Registry の“小窓”で走査する最大エントリ数既定256
- 値を小さくすると superslab_refill() と mmap直前ゲートでの探索コストが減る一方、adopt 命中率が低下し OOM/新規mmap が増える可能性あり。
- TinyHotなど命中率が高い場合は 64/128 などをA/B推奨。
WIP: Add TLS SLL validation and SuperSlab registry fallback ChatGPT's diagnostic changes to address TLS_SLL_HDR_RESET issue. Current status: Partial mitigation, but root cause remains. Changes Applied: 1. SuperSlab Registry Fallback (hakmem_super_registry.h) - Added legacy table probe when hash map lookup misses - Prevents NULL returns for valid SuperSlabs during initialization - Status: ✅ Works but may hide underlying registration issues 2. TLS SLL Push Validation (tls_sll_box.h) - Reject push if SuperSlab lookup returns NULL - Reject push if class_idx mismatch detected - Added [TLS_SLL_PUSH_NO_SS] diagnostic message - Status: ✅ Prevents list corruption (defensive) 3. SuperSlab Allocation Class Fix (superslab_allocate.c) - Pass actual class_idx to sp_internal_allocate_superslab - Prevents dummy class=8 causing OOB access - Status: ✅ Root cause fix for allocation path 4. Debug Output Additions - First 256 push/pop operations traced - First 4 mismatches logged with details - SuperSlab registration state logged - Status: ✅ Diagnostic tool (not a fix) 5. TLS Hint Box Removed - Deleted ss_tls_hint_box.{c,h} (Phase 1 optimization) - Simplified to focus on stability first - Status: ⏳ Can be re-added after root cause fixed Current Problem (REMAINS UNSOLVED): - [TLS_SLL_HDR_RESET] still occurs after ~60 seconds of sh8bench - Pointer is 16 bytes offset from expected (class 1 → class 2 boundary) - hak_super_lookup returns NULL for that pointer - Suggests: Use-After-Free, Double-Free, or pointer arithmetic error Root Cause Analysis: - Pattern: Pointer offset by +16 (one class 1 stride) - Timing: Cumulative problem (appears after 60s, not immediately) - Location: Header corruption detected during TLS SLL pop Remaining Issues: ⚠️ Registry fallback is defensive (may hide registration bugs) ⚠️ Push validation prevents symptoms but not root cause ⚠️ 16-byte pointer offset source unidentified Next Steps for Investigation: 1. Full pointer arithmetic audit (Magazine ⇔ TLS SLL paths) 2. Enhanced logging at HDR_RESET point: - Expected vs actual pointer value - Pointer provenance (where it came from) - Allocation trace for that block 3. Verify Headerless flag is OFF throughout build 4. Check for double-offset application in conversions Technical Assessment: - 60% root cause fixes (allocation class, validation) - 40% defensive mitigation (registry fallback, push rejection) Performance Impact: - Registry fallback: +10-30 cycles on cold path (negligible) - Push validation: +5-10 cycles per push (acceptable) - Overall: < 2% performance impact estimated Related Issues: - Phase 1 TLS Hint Box removed temporarily - Phase 2 Headerless blocked until stability achieved 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 20:42:28 +09:00
## Mid 向け簡素化リフィル1281024B向けの分岐削減
- HAKMEM_TINY_MID_REFILL_SIMPLE=0/1
- クラス>=4128B以上で、sticky/hot/mailbox/registry/adopt の多段探索をスキップし、
1) 既存TLSのSuperSlabに未使用Slabがあれば直接初期化→bind、
2) なければ新規SuperSlabを確保して先頭Slabをbind、の順に簡素化します。
- 目的: superslab_refill() 内の分岐と走査を削減tput重視A/B用
- 注意: adopt機会が減るため、PFやメモリ効率は変動します。常用前にA/B必須。
Mid 向けリフィル・バッチSLL補強
- HAKMEM_TINY_REFILL_COUNT_MID=N
- クラス>=4128B以上の SLL リフィル時に carve する個数の上書き(既定: max_take または余力)。
- 例: 32/64/96 でA/B。SLLが枯渇しにくくなり、refill頻度が下がる可能性あり。
Alloc側 remote ヘッド読みの緩和A/B
- HAKMEM_TINY_ALLOC_REMOTE_RELAX=0/1
- hak_tiny_alloc_superslab() で `remote_heads[slab_idx]` 非ゼロチェックを relaxed 読みで実施(既定は acquire
- 所有権獲得→drain の順序は保持されるため安全。分岐率の低下・ロード圧の軽減を狙うA/B用。
Front リフィル量A/B
- HAKMEM_TINY_REFILL_COUNT=N全クラス共通
- HAKMEM_TINY_REFILL_COUNT_HOT=Nclass<=3
- HAKMEM_TINY_REFILL_COUNT_MID=Nclass>=4
- HAKMEM_TINY_REFILL_COUNT_C{0..7}=Nクラス個別
- tiny_alloc_fast のリフィル数を制御既定16。大きくするとミス頻度が下がる一方、1回のリフィルコストは増える。
重要: publish/adopt の前提SuperSlab ON
- HAKMEM_TINY_USE_SUPERSLAB=1
- publish→mailbox→adopt のパイプラインは SuperSlab 経路が ON のときのみ動作します。
- ベンチでは既定ONを推奨A/BでOFFにしてメモリ効率重視の比較も可能
- OFF の場合、[Publish Pipeline]/[Publish Hits] は 0 のままとなります。
SuperSlab cache / prechargePhase 6.24+
- HAKMEM_TINY_SS_CACHE=N
- クラス共通の SuperSlab キャッシュ上限per-class の保持枚数。0=無制限、未指定=無効。
- キャッシュ有効時は `superslab_free()` が空の SuperSlab を即 munmap せず、キャッシュに積んで再利用する。
- HAKMEM_TINY_SS_CACHE_C{0..7}=N
- クラス別のキャッシュ上限(個別指定)。指定があるクラスは `HAKMEM_TINY_SS_CACHE` より優先。
- HAKMEM_TINY_SS_PRECHARGE=N
- Tiny クラスごとに N 枚の SuperSlab を事前確保し、キャッシュにプールする。0=無効。
- 事前確保した SuperSlab は `MAP_POPULATE` 相当で先読みされ、初回アクセス時の PF を抑制。
- 指定すると自動的にキャッシュも有効化されるprecharge 分を保持するため)。
- HAKMEM_TINY_SS_PRECHARGE_C{0..7}=N
- クラス別の precharge 枚数(個別上書き)。例: 8B クラスのみ 4 枚プリチャージ → `HAKMEM_TINY_SS_PRECHARGE_C0=4`
- HAKMEM_TINY_SS_POPULATE_ONCE=1
- 次回 `mmap` で取得する SuperSlab を 1 回だけ `MAP_POPULATE` で fault-inA/B 用のワンショットプリタッチ)。
SuperSlab prefaultPF 削減のための前倒し)
- HAKMEM_SS_PREFAULT=0/1/2/3
- 0: OFF安全デフォルト。`g_ss_populate_once` によるワンショット `MAP_POPULATE` のみ)
- 1: POPULATE — 新規 SuperSlab の `mmap` で常に `MAP_POPULATE` を付与し、page fault を kernel 側で事前解消perf 計測用)。
- 2: TOUCH — `MAP_POPULATE` に加えて `ss_prefault_region()` で SuperSlab 全域を 1 回 4KB stride で touchPF をほぼゼロにしたい実験用)。
- 3: ASYNC — 予約値(現状は TOUCH と同等の扱いだが、将来的に BG thread prefault 用に拡張予定)。
- Box: `core/box/ss_prefault_box.h`(ポリシー決定)+ `core/box/ss_allocation_box.c`mmap 直後の実行)。
Harvest / Guardmmap前の収穫ゲート
- HAKMEM_TINY_SS_CAP=N
- Tiny 各クラスにおける SuperSlab 上限0=無制限)。
- HAKMEM_TINY_SS_CAP_C{0..7}=N
- クラス別上限の個別指定0=無制限)。
- HAKMEM_TINY_GLOBAL_WATERMARK_MB=MB
- 総確保バイト数がしきい値MBを超えた場合にハーベストを強制0=無効)。
Countersダンプ
- HAKMEM_TINY_COUNTERS_DUMP=1
- 拡張カウンタを標準エラーにダンプ(クラス別)。
- SS adopt/publish に加えて、Slab adopt/publish/requeue/miss を出力。
- [Publish Pipeline]: notify_calls / same_empty_pubs / remote_transitions / mailbox_reg_calls / mailbox_slow_disc
P0 Optimization: Shared Pool fast path with O(1) metadata lookup Performance Results: - Throughput: 2.66M ops/s → 3.8M ops/s (+43% improvement) - sp_meta_find_or_create: O(N) linear scan → O(1) direct pointer - Stage 2 metadata scan: 100% → 10-20% (80-90% reduction via hints) Core Optimizations: 1. O(1) Metadata Lookup (superslab_types.h) - Added `shared_meta` pointer field to SuperSlab struct - Eliminates O(N) linear search through ss_metadata[] array - First access: O(N) scan + cache | Subsequent: O(1) direct return 2. sp_meta_find_or_create Fast Path (hakmem_shared_pool.c) - Check cached ss->shared_meta first before linear scan - Cache pointer after successful linear scan for future lookups - Reduces 7.8% CPU hotspot to near-zero for hot paths 3. Stage 2 Class Hints Fast Path (hakmem_shared_pool_acquire.c) - Try class_hints[class_idx] FIRST before full metadata scan - Uses O(1) ss->shared_meta lookup for hint validation - __builtin_expect() for branch prediction optimization - 80-90% of acquire calls now skip full metadata scan 4. Proper Initialization (ss_allocation_box.c) - Initialize shared_meta = NULL in superslab_allocate() - Ensures correct NULL-check semantics for new SuperSlabs Additional Improvements: - Updated ptr_trace and debug ring for release build efficiency - Enhanced ENV variable documentation and analysis - Added learner_env_box.h for configuration management - Various Box optimizations for reduced overhead Thread Safety: - All atomic operations use correct memory ordering - shared_meta cached under mutex protection - Lock-free Stage 2 uses proper CAS with acquire/release semantics Testing: - Benchmark: 1M iterations, 3.8M ops/s stable - Build: Clean compile RELEASE=0 and RELEASE=1 - No crashes, memory leaks, or correctness issues Next Optimization Candidates: - P1: Per-SuperSlab free slot bitmap for O(1) slot claiming - P2: Reduce Stage 2 critical section size - P3: Page pre-faulting (MAP_POPULATE) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 16:21:54 +09:00
- [Free Pipeline]: ss_local / ss_remote / tls_sll / magazine
- `HAKMEM_STATS=counters` / `HAKMEM_STATS=refill` でも一括有効化可能(マスタ箱経由)。
Safety (free の検証)
- HAKMEM_SAFE_FREE=1
- free 境界で追加の検証を有効化SuperSlab 範囲・クラス不一致・危険な二重 free の検出)。
- デバッグ時の既定推奨。perf 計測時は 0 を推奨。
P2 TLS SLL Redesign (Header/Next conflict fix)
- HAKMEM_TINY_ACTIVE_TRACK=1
- meta->active / meta->tls_cached tracking を有効化。
- active: ユーザが保持中のブロック数
- tls_cached: TLS SLL にキャッシュされたブロック数
- Invariant: active + tls_cached ≈ used
- 有効時、ss_is_slab_empty() は active==0 で EMPTY 判定TLS SLL のキャッシュも考慮)。
- オーバーヘッド: 約1%atomic inc/dec per alloc/free
- HAKMEM_TINY_NO_CLASS_MAP=1
- class_map ルックアップを無効化legacy mode
- 既定: class_map ONP2.1 で default 化)。
- Header から class_idx を読む従来動作に戻すHeader/Next 競合リスクあり)。
- HAKMEM_TINY_RESTORE_HEADER=1
- tiny_next_store() で Header 復元を強制legacy mode
- 既定: Header 復元 OFFP2.3 で無効化)。
- class_map 使用時は Header 復元不要alloc 時に HAK_RET_ALLOC で書き直される)。
- HAKMEM_TINY_INVARIANT_CHECK=1
- active + tls_cached ≈ used の不変条件検証を有効化debug builds
- 違反時は stderr に警告出力NDEBUG 未定義時のみ)。
- オーバーヘッド: 約2%ss_verify_superslab_invariants() 呼び出し時のみ)。
- HAKMEM_TINY_INVARIANT_DUMP=1
- スラブ状態の定期ダンプを有効化debug builds, NDEBUG 未定義時のみ)。
- used/active/tls_cached/capacity/class の内訳を stderr に出力。
Frontend (mimalloc-inspired, experimental)
- HAKMEM_INT_ADAPT_REFILL=0/1
- INTで refill 上限(`HAKMEM_TINY_REFILL_MAX(_HOT)`をウィンドウ毎に±16で調整既定ON
- HAKMEM_INT_ADAPT_CAPS=0/1
- INTでクラス別 MAG/SLL 上限を軽く調整±16/±32。熱いクラスは上限を少し広げ、低頻度なら縮小既定ON
Other useful
New (debug isolation)
- HAKMEM_TINY_DISABLE_READY=0/1
- Ready/Mailboxのコンシューマ経路を完全停止既定0=ON。TSan/ASanの隔離実験でSS+freelistのみを通す用途。
- HAKMEM_DEBUG_SEGV=0/1
- 早期SIGSEGVハンドラを登録し、stderrへバックトレースを1回だけ出力環境により未出力のことあり
- HAKMEM_FORCE_LIBC_ALLOC_INIT=0/1
- プロセス起動hak_init()完了までの期間だけ、malloc/free を libc へ強制ルーティング(初期化中の dlsym→malloc 再帰や
TLS 未初期化アクセスを回避。init 完了後は自動で通常経路に戻るenv が設定されていても、init 後は無効化される動作)。
- HAKMEM_TINY_MAG_CAP=N
- TLSマガジンの上限通常パスのチューニングに使用
- HAKMEM_TINY_MAG_CAP_C{0..7}=N
- クラス別のTLSマガジン上限通常パス。指定時はクラスごとの既定値を上書き例: 64B=class3 に 512 を指定)
- HAKMEM_TINY_TLS_SLL=0/1
- 通常パスのSLLをON/OFF
- HAKMEM_TINY_SLL_CAP_C{0..7}=N
- 通常パスのクラス別SLL上限絶対値。指定時は倍率計算をバイパス
- HAKMEM_TINY_REFILL_MAX=N
- マガジン低水位時の一括補充上限既定64。大きくすると補充回数が減るが瞬間メモリ圧は増える
- HAKMEM_TINY_REFILL_MAX_HOT=N
- 8/16/32/64Bクラスclass<=3向けの上位上限既定192。小サイズ帯のピーク探索用
- HAKMEM_TINY_REFILL_MAX_C{0..7}=N
- クラス別の補充上限個別上書き。設定があるクラスのみ有効0=未設定)
- HAKMEM_TINY_REFILL_MAX_HOT_C{0..7}=N
- ホットクラス0..3)用の個別上書き。設定がある場合は `REFILL_MAX_HOT` より優先
- (削除済み) HAKMEM_TINY_BG_REMOTE*
- 2025-12 cleanup: BG Remote系ENVは廃止BGリモートは固定OFF
- HAKMEM_TINY_REFILL_COUNT=NULTRA_SIMPLE用
- ULTRA_SIMPLE の SLL リフィル個数(既定 32、8256
- HAKMEM_TINY_FLUSH_ON_EXIT=0/1
- 退出時にTinyマガジンをフラッシュトリムRSS計測用
- HAKMEM_TINY_RSS_BUDGET_KB=N
- INTエンジン起動時にTinyのRSS予算kBを設定。超過時にクラス別のMAG/SLL上限を段階的に縮小メモリ優先
- HAKMEM_TINY_INT_TIGHT=0/1
- INTの調整を縮小側にバイアス閾値を上げ、MAG/SLLの最小値を床に近づける
- HAKMEM_TINY_DIET_STEP=N新, 既定16
- 予算超過時の一回あたり縮小量MAG: step, SLL: step×2
- HAKMEM_TINY_CAP_FLOOR_C{0..7}=N
- クラス別MAGの下限例: C0=64, C3=128。INTの縮小時にこれ未満まで下げない。
- HAKMEM_DEBUG_COUNTERS=0/1
- パス/Ultraのデバッグカウンタをビルドに含める既定0=除去。ONで `HAKMEM_TINY_PATH_DEBUG=1` 時に atexit ダンプ。
- HAKMEM_ENABLE_STATS
- 定義時のみホットパスで `stats_record_alloc/free` を実行。未定義時は完全に呼ばれない(ベンチ最小化)。
- HAKMEM_TINY_TRACE_RING=1
- Tiny Debug Ring を有効化。`SIGUSR2` またはクラッシュ時に直近4096件の alloc/free/publish/remote イベントを stderr ダンプ。
- HAKMEM_TINY_STAT_SAMPLINGビルド定義, 任意)/ HAKMEM_TINY_STAT_RATE_LG環境, 任意)
- 統計が有効な場合でも、alloc側の統計更新を低頻度化例: RATE_LG=14 → 16384回に1回
- 既定はOFFサンプリング無し毎回更新。ベンチ用にONで命令数を削減可能。
- HAKMEM_TINY_HOTMAG=0/1
- 小クラス用の小型TLSマガジン128要素, classes 0..3を有効化。既定0A/B用
- alloc: HotMag→SLL→Magazine の順でヒットを狙う。free: SLL優先、溢れ時にHotMag→Magazine。
P-Tier + Tiny Route Policy: Aggressive Superslab Management + Safe Routing ## Phase 1: Utilization-Aware Superslab Tiering (案B実装済) - Add ss_tier_box.h: Classify SuperSlabs into HOT/DRAINING/FREE based on utilization - HOT (>25%): Accept new allocations - DRAINING (≤25%): Drain only, no new allocs - FREE (0%): Ready for eager munmap - Enhanced shared_pool_release_slab(): - Check tier transition after each slab release - If tier→FREE: Force remaining slots to EMPTY and call superslab_free() immediately - Bypasses LRU cache to prevent registry bloat from accumulating DRAINING SuperSlabs - Test results (bench_random_mixed_hakmem): - 1M iterations: ✅ ~1.03M ops/s (previously passed) - 10M iterations: ✅ ~1.15M ops/s (previously: registry full error) - 50M iterations: ✅ ~1.08M ops/s (stress test) ## Phase 2: Tiny Front Routing Policy (新規Box) - Add tiny_route_box.h/c: Single 8-byte table for class→routing decisions - ROUTE_TINY_ONLY: Tiny front exclusive (no fallback) - ROUTE_TINY_FIRST: Try Tiny, fallback to Pool if fails - ROUTE_POOL_ONLY: Skip Tiny entirely - Profiles via HAKMEM_TINY_PROFILE ENV: - "hot": C0-C3=TINY_ONLY, C4-C6=TINY_FIRST, C7=POOL_ONLY - "conservative" (default): All TINY_FIRST - "off": All POOL_ONLY (disable Tiny) - "full": All TINY_ONLY (microbench mode) - A/B test results (ws=256, 100k ops random_mixed): - Default (conservative): ~2.90M ops/s - hot: ~2.65M ops/s (more conservative) - off: ~2.86M ops/s - full: ~2.98M ops/s (slightly best) ## Design Rationale ### Registry Pressure Fix (案B) - Problem: DRAINING tier SS occupied registry indefinitely - Solution: When total_active_blocks→0, immediately free to clear registry slot - Result: No more "registry full" errors under stress ### Routing Policy Box (新) - Problem: Tiny front optimization scattered across ENV/branches - Solution: Centralize routing in single table, select profiles via ENV - Benefit: Safe A/B testing without touching hot path code - Future: Integrate with RSS budget/learning layers for dynamic profile switching ## Next Steps (性能最適化) - Profile Tiny front internals (TLS SLL, FastCache, Superslab backend latency) - Identify bottleneck between current ~2.9M ops/s and mimalloc ~100M ops/s - Consider: - Reduce shared pool lock contention - Optimize unified cache hit rate - Streamline Superslab carving logic 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 18:01:25 +09:00
### Superslab Tiering / Registry 制御
- HAKMEM_SS_TIER_DOWN_THRESHOLD
- 型: float (0.01.0)
- 既定値: `0.25`
- 役割: SuperSlab 利用率(`total_active_blocks / capacity`がこの値以下になったとき、Tier を `HOT→DRAINING` に落とすための下限。
- 影響: DRAINING Tier の SuperSlab は新規 alloc の対象から外れ、drain/解放の対象になるBox: `ss_tier_box.h`)。
- HAKMEM_SS_TIER_UP_THRESHOLD
- 型: float (0.01.0)
- 既定値: `0.50`
- 役割: DRAINING Tier の SuperSlab の利用率がこの値以上になったときに `DRAINING→HOT` に戻すための上限(ヒステリシス)。
- 影響: 利用率が一時的にブレても HOT/DRAINING を行き来しにくくし、Tier の振動を防ぐ。
### Tiny Front RoutingTiny vs Pool の切替)
- HAKMEM_TINY_PROFILE
- 型: string
- 既定値: `"conservative"`
- 役割: Tiny FrontTLS SLL / FastCacheと Pool/backend のルーティング方針をクラス別に切り替えるプロファイル。
- プロファイル:
- `"conservative"`(既定):
- C0〜C7 すべて `TINY_FIRST`(まず Tiny front、失敗時は Pool/backend にフォールバック)
- `"hot"`:
- C0〜C3: `TINY_ONLY`(小クラスを Tiny 専用で積極活用)
- C4〜C6: `TINY_FIRST`
- C7: `POOL_ONLY`1KB headerless は Pool に任せる)
- `"off"`:
- C0〜C7 すべて `POOL_ONLY`Tiny front を完全に無効化)
- `"full"`:
- C0〜C7 すべて `TINY_ONLY`microbench 用、Gate としては常に Tiny 経由)
- 実装:
- Box: `core/box/tiny_route_box.h` / `tiny_route_box.c`
- Gate: `tiny_alloc_gate_fast()` がクラスごとに `TinyRoutePolicy` を参照して Tiny vs Pool を振り分ける。
WIP: Add TLS SLL validation and SuperSlab registry fallback ChatGPT's diagnostic changes to address TLS_SLL_HDR_RESET issue. Current status: Partial mitigation, but root cause remains. Changes Applied: 1. SuperSlab Registry Fallback (hakmem_super_registry.h) - Added legacy table probe when hash map lookup misses - Prevents NULL returns for valid SuperSlabs during initialization - Status: ✅ Works but may hide underlying registration issues 2. TLS SLL Push Validation (tls_sll_box.h) - Reject push if SuperSlab lookup returns NULL - Reject push if class_idx mismatch detected - Added [TLS_SLL_PUSH_NO_SS] diagnostic message - Status: ✅ Prevents list corruption (defensive) 3. SuperSlab Allocation Class Fix (superslab_allocate.c) - Pass actual class_idx to sp_internal_allocate_superslab - Prevents dummy class=8 causing OOB access - Status: ✅ Root cause fix for allocation path 4. Debug Output Additions - First 256 push/pop operations traced - First 4 mismatches logged with details - SuperSlab registration state logged - Status: ✅ Diagnostic tool (not a fix) 5. TLS Hint Box Removed - Deleted ss_tls_hint_box.{c,h} (Phase 1 optimization) - Simplified to focus on stability first - Status: ⏳ Can be re-added after root cause fixed Current Problem (REMAINS UNSOLVED): - [TLS_SLL_HDR_RESET] still occurs after ~60 seconds of sh8bench - Pointer is 16 bytes offset from expected (class 1 → class 2 boundary) - hak_super_lookup returns NULL for that pointer - Suggests: Use-After-Free, Double-Free, or pointer arithmetic error Root Cause Analysis: - Pattern: Pointer offset by +16 (one class 1 stride) - Timing: Cumulative problem (appears after 60s, not immediately) - Location: Header corruption detected during TLS SLL pop Remaining Issues: ⚠️ Registry fallback is defensive (may hide registration bugs) ⚠️ Push validation prevents symptoms but not root cause ⚠️ 16-byte pointer offset source unidentified Next Steps for Investigation: 1. Full pointer arithmetic audit (Magazine ⇔ TLS SLL paths) 2. Enhanced logging at HDR_RESET point: - Expected vs actual pointer value - Pointer provenance (where it came from) - Allocation trace for that block 3. Verify Headerless flag is OFF throughout build 4. Check for double-offset application in conversions Technical Assessment: - 60% root cause fixes (allocation class, validation) - 40% defensive mitigation (registry fallback, push rejection) Performance Impact: - Registry fallback: +10-30 cycles on cold path (negligible) - Push validation: +5-10 cycles per push (acceptable) - Overall: < 2% performance impact estimated Related Issues: - Phase 1 TLS Hint Box removed temporarily - Phase 2 Headerless blocked until stability achieved 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 20:42:28 +09:00
## USDT/tracepointsperfのユーザ空間静的トレース
- ビルド時に `CFLAGS+=-DHAKMEM_USDT=1` を付与すると、主要分岐にUSDTDTrace互換プローブが埋め込まれます。
- 依存: `<sys/sdt.h>`Debian/Ubuntu: `sudo apt-get install systemtap-sdt-dev`)。
- プローブ名provider=hakmem例:
- `sll_pop`, `mag_pop`, `front_pop`allocホットパス
- `bump_hit`TLSバンプシャドウ命中
- `slow_alloc`(スローパス突入)
- 使い方(例):
- 一覧: `perf list 'sdt:hakmem:*'`
- 集計: `perf stat -e sdt:hakmem:front_pop,cycles ./bench_tiny_hot_hakmem 32 100 40000`
- 記録: `perf record -e sdt:hakmem:sll_pop -e sdt:hakmem:mag_pop ./bench_tiny_hot_hakmem 32 100 50000`
- 権限/環境の注意:
- `unknown tracepoint` → perfがUSDTsdt:)非対応、または古いツール。`sudo apt-get install linux-tools-$(uname -r)` を推奨。
- `can't access trace events` → tracefs権限不足。
- `sudo mount -t tracefs -o mode=755 nodev /sys/kernel/tracing`
- `sudo sysctl kernel.perf_event_paranoid=1`
- WSLなど一部カーネルでは UPROBE/USDT が無効な場合がありますPMUのみにフォールバック
WIP: Add TLS SLL validation and SuperSlab registry fallback ChatGPT's diagnostic changes to address TLS_SLL_HDR_RESET issue. Current status: Partial mitigation, but root cause remains. Changes Applied: 1. SuperSlab Registry Fallback (hakmem_super_registry.h) - Added legacy table probe when hash map lookup misses - Prevents NULL returns for valid SuperSlabs during initialization - Status: ✅ Works but may hide underlying registration issues 2. TLS SLL Push Validation (tls_sll_box.h) - Reject push if SuperSlab lookup returns NULL - Reject push if class_idx mismatch detected - Added [TLS_SLL_PUSH_NO_SS] diagnostic message - Status: ✅ Prevents list corruption (defensive) 3. SuperSlab Allocation Class Fix (superslab_allocate.c) - Pass actual class_idx to sp_internal_allocate_superslab - Prevents dummy class=8 causing OOB access - Status: ✅ Root cause fix for allocation path 4. Debug Output Additions - First 256 push/pop operations traced - First 4 mismatches logged with details - SuperSlab registration state logged - Status: ✅ Diagnostic tool (not a fix) 5. TLS Hint Box Removed - Deleted ss_tls_hint_box.{c,h} (Phase 1 optimization) - Simplified to focus on stability first - Status: ⏳ Can be re-added after root cause fixed Current Problem (REMAINS UNSOLVED): - [TLS_SLL_HDR_RESET] still occurs after ~60 seconds of sh8bench - Pointer is 16 bytes offset from expected (class 1 → class 2 boundary) - hak_super_lookup returns NULL for that pointer - Suggests: Use-After-Free, Double-Free, or pointer arithmetic error Root Cause Analysis: - Pattern: Pointer offset by +16 (one class 1 stride) - Timing: Cumulative problem (appears after 60s, not immediately) - Location: Header corruption detected during TLS SLL pop Remaining Issues: ⚠️ Registry fallback is defensive (may hide registration bugs) ⚠️ Push validation prevents symptoms but not root cause ⚠️ 16-byte pointer offset source unidentified Next Steps for Investigation: 1. Full pointer arithmetic audit (Magazine ⇔ TLS SLL paths) 2. Enhanced logging at HDR_RESET point: - Expected vs actual pointer value - Pointer provenance (where it came from) - Allocation trace for that block 3. Verify Headerless flag is OFF throughout build 4. Check for double-offset application in conversions Technical Assessment: - 60% root cause fixes (allocation class, validation) - 40% defensive mitigation (registry fallback, push rejection) Performance Impact: - Registry fallback: +10-30 cycles on cold path (negligible) - Push validation: +5-10 cycles per push (acceptable) - Overall: < 2% performance impact estimated Related Issues: - Phase 1 TLS Hint Box removed temporarily - Phase 2 Headerless blocked until stability achieved 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 20:42:28 +09:00
## ビルドプリセットTinyHot最短フロント
- コンパイル時フラグ: `-DHAKMEM_TINY_MINIMAL_FRONT=1`
- 入口から UltraFront/Quick/Frontend/HotMag/SuperSlab try/BumpShadow を物理的に除去
- 残る経路: `SLL → TLS Magazine → SuperSlab →(以降のスローパス)`
- Makefileターゲット: `make bench_tiny_front`
- ベンチと相性の悪い分岐を取り除き、命令列を短縮PGOと併用推奨
- 付与フラグ: `-DHAKMEM_TINY_MAG_OWNER=0`マガジン項目のowner書き込みを省略し、alloc/freeの書込み負荷を削減
- 実行時スイッチ軽量A/B: `HAKMEM_TINY_MINIMAL_HOT=1`
- 入口で SuperSlab TLSバンプ→SuperSlab直経路を優先ビルド除去ではなく分岐
- TinyHotでは概ね不利命令・分岐増なため、既定OFF。ベンチA/B用途のみ。
WIP: Add TLS SLL validation and SuperSlab registry fallback ChatGPT's diagnostic changes to address TLS_SLL_HDR_RESET issue. Current status: Partial mitigation, but root cause remains. Changes Applied: 1. SuperSlab Registry Fallback (hakmem_super_registry.h) - Added legacy table probe when hash map lookup misses - Prevents NULL returns for valid SuperSlabs during initialization - Status: ✅ Works but may hide underlying registration issues 2. TLS SLL Push Validation (tls_sll_box.h) - Reject push if SuperSlab lookup returns NULL - Reject push if class_idx mismatch detected - Added [TLS_SLL_PUSH_NO_SS] diagnostic message - Status: ✅ Prevents list corruption (defensive) 3. SuperSlab Allocation Class Fix (superslab_allocate.c) - Pass actual class_idx to sp_internal_allocate_superslab - Prevents dummy class=8 causing OOB access - Status: ✅ Root cause fix for allocation path 4. Debug Output Additions - First 256 push/pop operations traced - First 4 mismatches logged with details - SuperSlab registration state logged - Status: ✅ Diagnostic tool (not a fix) 5. TLS Hint Box Removed - Deleted ss_tls_hint_box.{c,h} (Phase 1 optimization) - Simplified to focus on stability first - Status: ⏳ Can be re-added after root cause fixed Current Problem (REMAINS UNSOLVED): - [TLS_SLL_HDR_RESET] still occurs after ~60 seconds of sh8bench - Pointer is 16 bytes offset from expected (class 1 → class 2 boundary) - hak_super_lookup returns NULL for that pointer - Suggests: Use-After-Free, Double-Free, or pointer arithmetic error Root Cause Analysis: - Pattern: Pointer offset by +16 (one class 1 stride) - Timing: Cumulative problem (appears after 60s, not immediately) - Location: Header corruption detected during TLS SLL pop Remaining Issues: ⚠️ Registry fallback is defensive (may hide registration bugs) ⚠️ Push validation prevents symptoms but not root cause ⚠️ 16-byte pointer offset source unidentified Next Steps for Investigation: 1. Full pointer arithmetic audit (Magazine ⇔ TLS SLL paths) 2. Enhanced logging at HDR_RESET point: - Expected vs actual pointer value - Pointer provenance (where it came from) - Allocation trace for that block 3. Verify Headerless flag is OFF throughout build 4. Check for double-offset application in conversions Technical Assessment: - 60% root cause fixes (allocation class, validation) - 40% defensive mitigation (registry fallback, push rejection) Performance Impact: - Registry fallback: +10-30 cycles on cold path (negligible) - Push validation: +5-10 cycles per push (acceptable) - Overall: < 2% performance impact estimated Related Issues: - Phase 1 TLS Hint Box removed temporarily - Phase 2 Headerless blocked until stability achieved 🤖 Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 20:42:28 +09:00
## Scripts
- scripts/run_tiny_hot_triad.sh <cycles>
- scripts/run_tiny_benchfast_triad.sh <cycles> — bench-only fast path triad
- scripts/run_tiny_sllonly_triad.sh <cycles> — SLL-only + warmup + PGO triad
- scripts/run_tiny_sllonly_r12w192_triad.sh <cycles> — SLL-only tuned32B: REFILL=12, WARMUP32=192
- scripts/run_ultra_debug_sweep.sh <cycles> <batch>
- scripts/sweep_ultra_params.sh <cycles> <bench_batch>
- scripts/run_comprehensive_pair.sh
- scripts/run_random_mixed_matrix.sh <cycles>
Bench-only build flags (compile-time)
- HAKMEM_TINY_BENCH_FASTPATH=1 — 入口を SLL→Mag→tiny refill に固定(最短パス)
- HAKMEM_TINY_BENCH_SLL_ONLY=1 — Mag を物理的に除去SLL-only、freeもSLLに直push
- HAKMEM_TINY_BENCH_TINY_CLASSES=3 — 対象クラス0..N, 3→≤64B
- HAKMEM_TINY_BENCH_WARMUP8/16/32/64 — 初回ウォームアップ個数(例: 32=160〜192
- HAKMEM_TINY_BENCH_REFILL/REFILL8/16/32/64 — リフィル個数(例: REFILL32=12
Makefile helpers
- bench_fastpath / pgo-benchfast-* — bench_fastpathのPGO
- bench_sll_only / pgo-benchsll-* — SLL-onlyのPGO
- pgo-benchsll-r12w192-* — 32Bに合わせたREFILL/WARMUPのPGO
PerfMain presetメインライン向け、安全寄り, optin
- 推奨環境変数(例):
- `HAKMEM_TINY_TLS_SLL=1`
- `HAKMEM_TINY_REFILL_MAX=96`
- `HAKMEM_TINY_REFILL_MAX_HOT=192`
- `HAKMEM_TINY_SPILL_HYST=16`
- 実行例:
- TinyHot triad: `HAKMEM_TINY_TLS_SLL=1 HAKMEM_TINY_REFILL_MAX=96 HAKMEM_TINY_REFILL_MAX_HOT=192 HAKMEM_TINY_SPILL_HYST=16 bash scripts/run_tiny_hot_triad.sh 60000`
- RandomMixed: `HAKMEM_TINY_TLS_SLL=1 HAKMEM_TINY_REFILL_MAX=96 HAKMEM_TINY_REFILL_MAX_HOT=192 HAKMEM_TINY_SPILL_HYST=16 bash scripts/run_random_mixed_matrix.sh 100000`
LD safety (for apps/LD_PRELOAD runs)
- HAKMEM_LD_SAFE=0/1/2
- 0: full (開発用のみ推奨)
- 1: Tinyのみ非Tinyはlibcへ委譲
- 2: パススルー(推奨デフォルト)
- HAKMEM_TINY_SPECIALIZE_8_16=0/1
- 8/16B向けに“mag-popのみ”の特化経路を有効化既定OFF。A/B用。
- HAKMEM_TINY_SPECIALIZE_32_64=0/1
- 32/64B向けに“mag-popのみ”の特化経路を有効化既定OFF。A/B用。
- HAKMEM_TINY_SPECIALIZE_MASK=<int>(新)
- クラス別に特化を有効化するビットマスクbit0=8B, bit1=16B, …, bit7=64B
- 例: 0x02 → 16Bのみ特化、0x0C → 32/64B特化。
- HAKMEM_TINY_BENCH_MODE=1
- ベンチ専用の簡素化採用パスを有効化。per-class 単一点の公開スロットを使用し、superslab_refill のスキャンと多段リング走査を回避。
- OOMガードharvest/trimは保持。A/B用途に限定してください。
---
ENV Cleanup Progress (2025-11)
==============================
Phase 4a: Hot Path getenv Caching
---------------------------------
COMPLETED: All getenv() calls in hot paths are now properly cached.
Fixed files:
- `core/hakmem_elo.c` - Added `is_quiet()` helper with cached `g_quiet_mode` flag
- Was: 10+ getenv("HAKMEM_QUIET") calls inside loops
- Now: Single cached lookup at first call
Verified (already correct):
- `core/hakmem_tiny_superslab.c` - Uses `static int g_*` caching pattern
- `core/hakmem_shared_pool.c` - Uses `static int xxx = -1` caching
- `core/hakmem_learner.c` - getenv outside main loop (thread start only)
- `core/box/pool_init_api.inc.h` - Init function (called once via pthread_once)
- `core/box/hak_core_init.inc.h` - Init function (called once via pthread_once)
- `core/box/hak_wrappers.inc.h` - Uses `static int on=-1` caching
ENV Statistics (from ENV_VARIABLE_SURVEY.md):
- Total ENV variables: 228
- Target after cleanup: ~80 (65% reduction)
- Categories:
- Core/Toggle: 15 (7%)
- Learning/Adaptive: 25 (11%)
- Performance Tuning: 45 (20%)
- Debug/Diagnostic: 65 (28%) ← Consolidation target
- Superslab/Backend: 25 (11%)
- TLS SLL (P2/P3): 20 (9%)
- Free Path Optimization: 15 (7%)
- Other: 23 (10%)
Phase 4b: Master Debug Control (COMPLETED)
------------------------------------------
New in 2025-11: Centralized debug control that works alongside individual module ENVs.
- `HAKMEM_DEBUG_ALL=1`
- Enable ALL debug modules at once (convenient for troubleshooting)
- Individual module ENVs (e.g., HAKMEM_SFC_DEBUG=0) can still override
- `HAKMEM_DEBUG_LEVEL=N`
- Set debug level: 0=off, 1=critical, 2=normal, 3=verbose
- When set to 2+, enables debug output for modules that don't have explicit ENV
- `HAKMEM_QUIET=1`
- Suppress ALL debug output (highest priority, overrides DEBUG_ALL/LEVEL)
Priority order:
1. HAKMEM_QUIET=1 → suppress all
2. Specific module ENV (e.g., HAKMEM_SFC_DEBUG=1) → use that value
3. HAKMEM_DEBUG_ALL=1 → enable all
4. HAKMEM_DEBUG_LEVEL >= threshold → enable
5. Default → disabled
Implementation: core/hakmem_debug_master.h
- hak_debug_check("HAKMEM_FOO_DEBUG") - Check if module should enable debug
- hak_is_quiet() - Quick check for quiet mode
Phase 4c: Master Trace Control (COMPLETED)
------------------------------------------
New in 2025-11: Unified trace control.
- `HAKMEM_TRACE=all`
- Enable ALL trace modules at once
- `HAKMEM_TRACE=ptr,refill,free,mailbox`
- Enable specific trace modules (comma-separated)
- `HAKMEM_TRACE_LEVEL=N`
- Set trace verbosity (1=basic, 2=detailed, 3=verbose)
Available trace modules:
ptr, refill, superslab, ring, free, mailbox, registry
Implementation: core/hakmem_trace_master.h
Phase 4d: Master Stats Control (COMPLETED)
------------------------------------------
New in 2025-11: Unified stats/dump control.
- `HAKMEM_STATS=all`
- Enable ALL stats modules at once
- `HAKMEM_STATS=sfc,fast,pool`
- Enable specific stats modules (comma-separated)
- `HAKMEM_STATS_DUMP=1`
- Dump stats at process exit
Available stats modules:
sfc, fast, heap, refill, counters, ring, invariant,
P0 Optimization: Shared Pool fast path with O(1) metadata lookup Performance Results: - Throughput: 2.66M ops/s → 3.8M ops/s (+43% improvement) - sp_meta_find_or_create: O(N) linear scan → O(1) direct pointer - Stage 2 metadata scan: 100% → 10-20% (80-90% reduction via hints) Core Optimizations: 1. O(1) Metadata Lookup (superslab_types.h) - Added `shared_meta` pointer field to SuperSlab struct - Eliminates O(N) linear search through ss_metadata[] array - First access: O(N) scan + cache | Subsequent: O(1) direct return 2. sp_meta_find_or_create Fast Path (hakmem_shared_pool.c) - Check cached ss->shared_meta first before linear scan - Cache pointer after successful linear scan for future lookups - Reduces 7.8% CPU hotspot to near-zero for hot paths 3. Stage 2 Class Hints Fast Path (hakmem_shared_pool_acquire.c) - Try class_hints[class_idx] FIRST before full metadata scan - Uses O(1) ss->shared_meta lookup for hint validation - __builtin_expect() for branch prediction optimization - 80-90% of acquire calls now skip full metadata scan 4. Proper Initialization (ss_allocation_box.c) - Initialize shared_meta = NULL in superslab_allocate() - Ensures correct NULL-check semantics for new SuperSlabs Additional Improvements: - Updated ptr_trace and debug ring for release build efficiency - Enhanced ENV variable documentation and analysis - Added learner_env_box.h for configuration management - Various Box optimizations for reduced overhead Thread Safety: - All atomic operations use correct memory ordering - shared_meta cached under mutex protection - Lock-free Stage 2 uses proper CAS with acquire/release semantics Testing: - Benchmark: 1M iterations, 3.8M ops/s stable - Build: Clean compile RELEASE=0 and RELEASE=1 - No crashes, memory leaks, or correctness issues Next Optimization Candidates: - P1: Per-SuperSlab free slot bitmap for O(1) slot claiming - P2: Reduce Stage 2 critical section size - P3: Page pre-faulting (MAP_POPULATE) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 16:21:54 +09:00
pagefault, front, pool, slim, guard, nearempty, trace
Implementation: core/hakmem_stats_master.h
---
ENV Cleanup Summary (2025-11)
=============================
- Phase 4a: Hot path getenv caching (hakmem_elo.c fixed)
- Phase 4b: HAKMEM_DEBUG_ALL/LEVEL master debug control
- Phase 4c: HAKMEM_TRACE unified trace control
- Phase 4d: HAKMEM_STATS unified stats control
Total new master control variables: 6
HAKMEM_DEBUG_ALL, HAKMEM_DEBUG_LEVEL
HAKMEM_TRACE, HAKMEM_TRACE_LEVEL
HAKMEM_STATS, HAKMEM_STATS_DUMP
All existing individual ENVs continue to work (backwards compatible)
---
Benchmark Environment Variables (2025-11-29)
=============================================
## HAKMEM_BENCH_FAST_FRONT
Enable ultra-fast header-based free path for benchmarks.
- **Default**: 0 (OFF)
- **Usage**: `HAKMEM_BENCH_FAST_FRONT=1 ./bench_random_mixed_hakmem`
- **Effect**: Tries hak_tiny_free_fast_v2() before normal free path
- **Location**: core/box/hak_free_api.inc.h:98-114
- **A/B Testing**: Compare throughput with/without this flag
## HAKMEM_BENCH_WARMUP
Number of warmup cycles before timed benchmark run.
- **Default**: 0 (no warmup)
- **Usage**: `HAKMEM_BENCH_WARMUP=1000000 ./bench_random_mixed_hakmem`
- **Effect**: Runs N allocation cycles (not timed) before starting benchmark
- **Purpose**: Warm up TLS caches, SuperSlabs, and system allocator
- **Location**: bench_random_mixed.c:69-92
## HAKMEM_FREE_ROUTE_TRACE
Debug trace for free() routing decisions.
- **Default**: 0 (OFF)
- **Usage**: `HAKMEM_FREE_ROUTE_TRACE=1 ./bench_random_mixed_hakmem`
- **Effect**: Logs first 32 free() routing decisions (tiny/pool/mid/external)
- **Output**: `[FREE_ROUTE] <domain> ptr=<addr>`
- **Location**: core/box/hak_free_api.inc.h:17-40
- **Use case**: Debug which Box handles each free()
## HAKMEM_EXTERNAL_GUARD_LOG
Enable ExternalGuard debug logging.
- **Default**: 0 (OFF)
- **Usage**: `HAKMEM_EXTERNAL_GUARD_LOG=1 ./bench`
- **Effect**: Logs all ExternalGuard calls with ptr info, SuperSlab lookup, FrontGate classification
- **Location**: core/box/external_guard_box.h:40-48
- **Use case**: Debug unknown pointers that reach ExternalGuard
## HAKMEM_EXTERNAL_GUARD_STATS
Print ExternalGuard statistics at exit.
- **Default**: 0 (OFF)
- **Usage**: `HAKMEM_EXTERNAL_GUARD_STATS=1 ./bench`
- **Output**: Total calls, unknown ptrs, etc.
- **Location**: core/box/external_guard_box.h:140-162
---
Build Flags (Compile-time, set via Makefile)
=============================================
## HAKMEM_TINY_SS_TRUST_MMAP_ZERO
Skip large memset() for fresh mmap SuperSlabs (trust OS zero pages).
- **Default**: 0 (defensive memset enabled)
- **Build**: `HAKMEM_TINY_SS_TRUST_MMAP_ZERO=1 make bench_random_mixed_hakmem`
- Or: `EXTRA_MAKEFLAGS="HAKMEM_TINY_SS_TRUST_MMAP_ZERO=1" ./build.sh release bench_random_mixed_hakmem`
- **Effect**:
- When =1 AND release build AND fresh mmap (not from cache):
- Skip memset for slabs/remote_heads/remote_counts/slab_listed arrays
- Still memset class_map to 255 (UNASSIGNED)
- Cache reuse (from_cache=1): Always full memset (defensive)
- **Performance**: +5.93% on bench_tiny_hot, neutral on bench_random_mixed
- **Safety**: Only activates in release builds; cache reuse always gets full memset
- **Location**:
- Flag definition: core/hakmem_build_flags.h:170-180
- Implementation: core/box/ss_allocation_box.c:37-78
- **A/B Testing**: Compare fresh SuperSlab allocation throughput
- **Recommendation**: Keep default=0 for safety; enable for production after testing
## HAKMEM_DISABLE_MINCORE_CHECK (REMOVED in Phase 3)
**DEPRECATED**: Removed in commit d78baf41c (Phase 3: Remove mincore() syscall completely)
- Previously controlled mincore() syscall in free path
- Now always disabled (trust internal metadata)
- See: CHECKPOINT_PHASE2_COMPLETE.md, PHASE2_PERF_ANALYSIS.md
---
## HAKMEM_TINY_FREE_POLICY_FAST_V2 (Phase POLICY-FAST-PATH-V2)
Skip policy snapshot for known-legacy classes in free path.
- **Value**: 0 (OFF), 1 (ON)
- **Default**: 0
- **Availability**: Only effective when Learner (v7) is disabled
- **Impact**: Can improve Mixed workload performance by 5-10%
- **A/B Testing**: Use for Mixed vs C6-heavy comparison
- **Note**: Disabled automatically if HAKMEM_SMALL_HEAP_V7_ENABLED=1
### Background
In Phase v11b-1, the free path uses a policy snapshot to route frees to different backends (ULTRA, MID v3.5, v7, legacy). This policy check adds overhead even when all classes use the legacy path. Phase POLICY-FAST-PATH-V2 introduces a fast-path bypass that skips the policy snapshot for classes known at startup to use only the legacy path.
### How it works
1. At startup, compute a non-legacy mask by checking which classes have ULTRA, MID v3, or MID v3.5 enabled
2. In the free hot path, if a class is NOT in the non-legacy mask, skip the policy snapshot and jump directly to legacy fallback
3. Automatically disabled if the Learner (v7) is enabled, since v7 policies are dynamic
### Observability
Use `HAKMEM_FREE_PATH_STATS=1` to see skip counts:
```
[FREE_PATH_STATS_POLICY_FASTV2] skip=12345678
```
### Example usage
```bash
# Enable fast-path optimization for Mixed workload
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE \
HAKMEM_TINY_FREE_POLICY_FAST_V2=1 \
./bench_random_mixed_hakmem 1000000 400 1
```
---
Update History:
- 2025-11-29: Added benchmark env vars (BENCH_FAST_FRONT, BENCH_WARMUP, FREE_ROUTE_TRACE)
- 2025-11-29: Added HAKMEM_TINY_SS_TRUST_MMAP_ZERO build flag
- 2025-11-29: Marked DISABLE_MINCORE_CHECK as removed