Files
hakmem/docs/specs/DOCS_INDEX.md
Moe Charm (CI) a9ddb52ad4 ENV cleanup: Remove BG/HotMag vars & guard fprintf (Larson 52.3M ops/s)
Phase 1 完了:環境変数整理 + fprintf デバッグガード

ENV変数削除(BG/HotMag系):
- core/hakmem_tiny_init.inc: HotMag ENV 削除 (~131 lines)
- core/hakmem_tiny_bg_spill.c: BG spill ENV 削除
- core/tiny_refill.h: BG remote 固定値化
- core/hakmem_tiny_slow.inc: BG refs 削除

fprintf Debug Guards (#if !HAKMEM_BUILD_RELEASE):
- core/hakmem_shared_pool.c: Lock stats (~18 fprintf)
- core/page_arena.c: Init/Shutdown/Stats (~27 fprintf)
- core/hakmem.c: SIGSEGV init message

ドキュメント整理:
- 328 markdown files 削除(旧レポート・重複docs)

性能確認:
- Larson: 52.35M ops/s (前回52.8M、安定動作)
- ENV整理による機能影響なし
- Debug出力は一部残存(次phase で対応)

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-26 14:45:26 +09:00

7.4 KiB
Raw Blame History

HAKMEM Docs Index (2025-10-29)

Purpose

  • Onepage map for current work: how to build, run, compare, and tune.
  • Focus on Tiny fastpath tuning vs system/mimalloc, with safe LD guidance.

Quick Build

  • Direct link (recommended for perf tuning)
    • make bench_fast
    • Run: HAKMEM_WRAP_TINY=1 ./bench_comprehensive_hakmem
  • PGO (direct link)
    • ./build_pgo.sh (profile+build)
    • Run: HAKMEM_WRAP_TINY=1 ./bench_comprehensive_hakmem
  • Shared (LD_PRELOAD) PGO
    • make pgo-profile-shared && make pgo-build-shared
    • Run: HAKMEM_WRAP_TINY=1 LD_PRELOAD=./libhakmem.so ./bench_comprehensive_system

DirectLink Comparisons (CSV)

  • Pair (HAKMEM vs mimalloc): bash scripts/run_comprehensive_pair.sh
    • CSV: bench_results/comp_pair_YYYYMMDD_HHMMSS/summary.csv
  • Tiny hot triad (HAKMEM/System/mimalloc): bash scripts/run_tiny_hot_triad.sh 80000
    • CSV: bench_results/tiny_hot_triad_YYYYMMDD_HHMMSS/results.csv
  • Random mixed triad: bash scripts/run_random_mixed_matrix.sh 120000
    • CSV: bench_results/random_mixed_YYYYMMDD_HHMMSS/results.csv

PerfMain preset (safe, mainlineoriented)

  • Build + run triad: bash scripts/run_perf_main_triad.sh 60000
    • Applies recommended tiny env (TLS_SLL=1, REFILL_MAX=96, HOT=192, HYST=16) without benchonly macros.

Tiny param sweeps

  • Basic: bash scripts/sweep_tiny_params.sh 100000
  • AdvancedSLL倍率/リフィル/クラス別MAGなど: bash scripts/sweep_tiny_advanced.sh 80000 --mag64-512

LD_PRELOAD Apps (optin)

  • Script: bash scripts/run_apps_with_hakmem.sh
  • Default safety: HAKMEM_LD_SAFE=2 (passthrough) set in script, then percase LD_PRELOAD on.
  • Recommendation: use directlink for perf; LD runs are for stability sampling only.

Tiny Modes and Knobs

  • Normal (default): TLS magazine + TLS SLL (≤256B)
    • HAKMEM_TINY_TLS_SLL=1 (default)
    • HAKMEM_TINY_MAG_CAP=128 (good tiny bench preset; 64B may prefer 512)
  • TinyQuickSlot最小フロント; 実験)
    • HAKMEM_TINY_QUICK=1
    • items[6] を1ラインに保持。miss時は SLL/Mag から少量補充して即返却。
  • Ultra (SLLonly, experimental):
    • HAKMEM_TINY_ULTRA=1 (optin)
    • HAKMEM_TINY_ULTRA_VALIDATE=0/1 (perf vs safety)
    • Perclass overrides: HAKMEM_TINY_ULTRA_BATCH_C{0..7}, HAKMEM_TINY_ULTRA_SLL_CAP_C{0..7}
  • FLINT (Fast Lightweight INTelligence): Frontend + deferred Intelligence実験
    • HAKMEM_TINY_FRONTEND=1 (enable array FastCache; miss falls back)
    • HAKMEM_TINY_FASTCACHE=1 (lowlevel switch; keep OFF unless A/B)
    • HAKMEM_INT_ENGINE=1 (event ring + BG thread adjusts fill targets)
    • イベント拡張(内部): timestamp/tier/flags/site_id/thread をリングに蓄積(ホットパス外)。今後の適応に活用

BestKnown Presets (direct link)

  • Tiny hot focus
    • export HAKMEM_WRAP_TINY=1
    • export HAKMEM_TINY_TLS_SLL=1
    • export HAKMEM_TINY_MAG_CAP=128 (64B: try 512)
    • export HAKMEM_TINY_REMOTE_DRAIN_TRYRATE=0
    • export HAKMEM_TINY_REMOTE_DRAIN_THRESHOLD=1000000
  • Memory efficiency A/B
    • export HAKMEM_TINY_FLUSH_ON_EXIT=1
    • Run bench/app; compare steadystate RSS with/without.

Refill Batch (A/B)

  • HAKMEM_TINY_REFILL_MAX_HOT既定192/ HAKMEM_TINY_REFILL_MAX既定64
  • 小サイズ帯8/16/32Bでピーク探索。現環境は既定付近が最良帯

Current Results (high level)

  • Tiny hot triad (PerfMain, 6080k cycles, safe):
    • 1664B: System ≈ 300335 M; HAKMEM ≈ 250300 M; mimalloc 535620 M.
    • 128B: HAKMEM ≈ 250270 M; System 170176 M; mimalloc 575586 M.
  • Comprehensive (direct link): mimalloc ≈ 0.91.0B; HAKMEM ≈ 0.250.27B.
  • Random mixed: three close; mimalloc slightly ahead; HAKMEM ≈ System ± a few %.

Benchonly highlight参考値, 専用ビルド)

  • SLLonly + warmup + PGO≤64Bで 824B が 400M超、32B/b100 最大 429.18MSystem 312.55M)。
    • 実行: bash scripts/run_tiny_sllonly_triad.sh 30000(安全な通常ビルドには含めません)

Open Focus

  • Close the 1664B gap (cap/batch tuning; SLL/minimag overhead shave).
  • Ultra (optin) stabilization; A/B vs normal.
  • Frontend refill heuristics; BG engine stop/join wiring (added).

Mid Range MT (8-32KB, mimalloc-style)

  • Status: COMPLETE (2025-11-01) - 110M ops/sec achieved
  • Quick benchmark: bash benchmarks/scripts/mid/run_mid_mt_bench.sh
  • Comparison: bash benchmarks/scripts/mid/compare_mid_mt_allocators.sh
  • Full report: MID_MT_COMPLETION_REPORT.md
  • Implementation: core/hakmem_mid_mt.{c,h}
  • Results: 110M ops/sec (100-101% of mimalloc, 2.12x faster than glibc)

ACE Learning Layer (Adaptive Control Engine)

  • Status: Phase 1 COMPLETE (2025-11-01) - Infrastructure ready 🚀
  • Goal: Fix weaknesses with adaptive learning (mimalloc超えを目指す)
    • Fragmentation stress: 3.87 → 10-20 M ops/s (2.6-5.2x target)
    • Large WS: 22.15 → 30-45 M ops/s (1.4-2.0x target)
    • realloc: 277ns → 140-210ns (1.3-2.0x target)
  • Documentation:
    • User guide: docs/ACE_LEARNING_LAYER.md
    • Technical plan: docs/ACE_LEARNING_LAYER_PLAN.md
    • Progress report: ACE_PHASE1_PROGRESS.md
  • Phase 1 Deliverables (COMPLETE ):
    • Metrics collection (hakmem_ace_metrics.{c,h})
    • UCB1 learning algorithm (hakmem_ace_ucb1.{c,h})
    • Dual-loop controller (hakmem_ace_controller.{c,h})
    • Dynamic TLS capacity adjustment
    • Hot-path metrics integration (alloc/free tracking)
    • A/B benchmark script (scripts/bench_ace_ab.sh)
  • Usage:
    • Enable: HAKMEM_ACE_ENABLED=1 ./your_benchmark
    • Debug: HAKMEM_ACE_ENABLED=1 HAKMEM_ACE_LOG_LEVEL=2 ./your_benchmark
    • A/B test: ./scripts/bench_ace_ab.sh
  • Next: Phase 2 - Extended benchmarking + learning convergence validation

Directory Structure (2025-11-01 Reorganization)

  • benchmarks/ - All benchmark-related files
    • src/ - Benchmark source code (tiny/mid/comprehensive/stress)
    • scripts/ - Benchmark scripts organized by category
    • results/ - Benchmark results (formerly bench_results/)
    • perf/ - Performance profiling data (formerly perf_data/)
  • tests/ - Test files (unit/integration/stress)
  • core/ - Core allocator implementation
  • docs/ - Documentation (benchmarks/, api/, guides/)
  • scripts/ - Development scripts (build/, apps/, maintenance/)
  • archive/ - Historical documents and analysis

Where to Read More

  • SlabHandle Box: docs/SLAB_HANDLE.mdownership + remote drain + metadata のカプセル化)
  • Free Safety: docs/FREE_SAFETY.md二重free/クラス不一致のFailFastとリング運用
  • Cleanup/Organization: CLEANUP_SUMMARY_2025_11_01.md (latest)
  • Archive: archive/README.md - Historical docs and analysis
  • Bench mode: BENCH_MODE.md
  • Env knobs: ENV_VARS.md
  • Tiny hot microbench: TINY_HOT_BENCH.md
  • Frontend/Backend split: FRONTEND_BACKEND_PLAN.md
  • LD status/safety: LD_PRELOAD_STATUS.md
  • Goals/Targets: GOALS_2025_10_29.md
  • Latest results: BENCH_RESULTS_2025_10_29.md (today), BENCH_RESULTS_2025_10_28.md (yesterday)
  • Mainline integration plan: MAINLINE_INTEGRATION.md
  • FLINT Intelligence (events/adaptation): FLINT_INTELLIGENCE.md

Hako / MIR / FFI

  • HAKO_MIR_FFI_SPEC.md — フロント型検証完結MIRは運ぶだけFFI機械的ローワリングの仕様

Notes

  • LD mode: keep HAKMEM_LD_SAFE=2 default for apps; prefer directlink for tuning.
  • Ultra/Frontend are experimental; keep OFF by default and use scripts for A/B.