Files
hakmem/docs/specs/DOCS_INDEX.md

151 lines
7.4 KiB
Markdown
Raw Normal View History

HAKMEM Docs Index (2025-10-29)
Purpose
- Onepage map for current work: how to build, run, compare, and tune.
- Focus on Tiny fastpath tuning vs system/mimalloc, with safe LD guidance.
Quick Build
- Direct link (recommended for perf tuning)
- `make bench_fast`
- Run: `HAKMEM_WRAP_TINY=1 ./bench_comprehensive_hakmem`
- PGO (direct link)
- `./build_pgo.sh` (profile+build)
- Run: `HAKMEM_WRAP_TINY=1 ./bench_comprehensive_hakmem`
- Shared (LD_PRELOAD) PGO
- `make pgo-profile-shared && make pgo-build-shared`
- Run: `HAKMEM_WRAP_TINY=1 LD_PRELOAD=./libhakmem.so ./bench_comprehensive_system`
DirectLink Comparisons (CSV)
- Pair (HAKMEM vs mimalloc): `bash scripts/run_comprehensive_pair.sh`
- CSV: `bench_results/comp_pair_YYYYMMDD_HHMMSS/summary.csv`
- Tiny hot triad (HAKMEM/System/mimalloc): `bash scripts/run_tiny_hot_triad.sh 80000`
- CSV: `bench_results/tiny_hot_triad_YYYYMMDD_HHMMSS/results.csv`
- Random mixed triad: `bash scripts/run_random_mixed_matrix.sh 120000`
- CSV: `bench_results/random_mixed_YYYYMMDD_HHMMSS/results.csv`
PerfMain preset (safe, mainlineoriented)
- Build + run triad: `bash scripts/run_perf_main_triad.sh 60000`
- Applies recommended tiny env (TLS_SLL=1, REFILL_MAX=96, HOT=192, HYST=16) without benchonly macros.
Tiny param sweeps
- Basic: `bash scripts/sweep_tiny_params.sh 100000`
- AdvancedSLL倍率/リフィル/クラス別MAGなど: `bash scripts/sweep_tiny_advanced.sh 80000 --mag64-512`
LD_PRELOAD Apps (optin)
- Script: `bash scripts/run_apps_with_hakmem.sh`
- Default safety: `HAKMEM_LD_SAFE=2` (passthrough) set in script, then percase `LD_PRELOAD` on.
- Recommendation: use directlink for perf; LD runs are for stability sampling only.
Tiny Modes and Knobs
- Normal (default): TLS magazine + TLS SLL (≤256B)
- `HAKMEM_TINY_TLS_SLL=1` (default)
- `HAKMEM_TINY_MAG_CAP=128` (good tiny bench preset; 64B may prefer 512)
- TinyQuickSlot最小フロント; 実験)
- `HAKMEM_TINY_QUICK=1`
- items[6] を1ラインに保持。miss時は SLL/Mag から少量補充して即返却。
- Ultra (SLLonly, experimental):
- `HAKMEM_TINY_ULTRA=1` (optin)
- `HAKMEM_TINY_ULTRA_VALIDATE=0/1` (perf vs safety)
- Perclass overrides: `HAKMEM_TINY_ULTRA_BATCH_C{0..7}`, `HAKMEM_TINY_ULTRA_SLL_CAP_C{0..7}`
- FLINT (Fast Lightweight INTelligence): Frontend + deferred Intelligence実験
- `HAKMEM_TINY_FRONTEND=1` (enable array FastCache; miss falls back)
- `HAKMEM_TINY_FASTCACHE=1` (lowlevel switch; keep OFF unless A/B)
- `HAKMEM_INT_ENGINE=1` (event ring + BG thread adjusts fill targets)
- イベント拡張(内部): timestamp/tier/flags/site_id/thread をリングに蓄積(ホットパス外)。今後の適応に活用
BestKnown Presets (direct link)
- Tiny hot focus
- `export HAKMEM_WRAP_TINY=1`
- `export HAKMEM_TINY_TLS_SLL=1`
- `export HAKMEM_TINY_MAG_CAP=128` (64B: try 512)
- `export HAKMEM_TINY_REMOTE_DRAIN_TRYRATE=0`
- `export HAKMEM_TINY_REMOTE_DRAIN_THRESHOLD=1000000`
- Memory efficiency A/B
- `export HAKMEM_TINY_FLUSH_ON_EXIT=1`
- Run bench/app; compare steadystate RSS with/without.
Refill Batch (A/B)
- `HAKMEM_TINY_REFILL_MAX_HOT`既定192/ `HAKMEM_TINY_REFILL_MAX`既定64
- 小サイズ帯8/16/32Bでピーク探索。現環境は既定付近が最良帯
Current Results (high level)
- Tiny hot triad (PerfMain, 6080k cycles, safe):
- 1664B: System ≈ 300335 M; HAKMEM ≈ 250300 M; mimalloc 535620 M.
- 128B: HAKMEM ≈ 250270 M; System 170176 M; mimalloc 575586 M.
- Comprehensive (direct link): mimalloc ≈ 0.91.0B; HAKMEM ≈ 0.250.27B.
- Random mixed: three close; mimalloc slightly ahead; HAKMEM ≈ System ± a few %.
Benchonly highlight参考値, 専用ビルド)
- SLLonly + warmup + PGO≤64Bで 824B が 400M超、32B/b100 最大 429.18MSystem 312.55M)。
- 実行: `bash scripts/run_tiny_sllonly_triad.sh 30000`(安全な通常ビルドには含めません)
Open Focus
- Close the 1664B gap (cap/batch tuning; SLL/minimag overhead shave).
- Ultra (optin) stabilization; A/B vs normal.
- Frontend refill heuristics; BG engine stop/join wiring (added).
Mid Range MT (8-32KB, mimalloc-style)
- **Status**: COMPLETE (2025-11-01) - 110M ops/sec achieved ✅
- Quick benchmark: `bash benchmarks/scripts/mid/run_mid_mt_bench.sh`
- Comparison: `bash benchmarks/scripts/mid/compare_mid_mt_allocators.sh`
- Full report: `MID_MT_COMPLETION_REPORT.md`
- Implementation: `core/hakmem_mid_mt.{c,h}`
- Results: 110M ops/sec (100-101% of mimalloc, 2.12x faster than glibc)
ACE Learning Layer (Adaptive Control Engine)
- **Status**: Phase 1 COMPLETE ✅ (2025-11-01) - Infrastructure ready 🚀
- **Goal**: Fix weaknesses with adaptive learning (mimalloc超えを目指す)
- Fragmentation stress: 3.87 → 10-20 M ops/s (2.6-5.2x target)
- Large WS: 22.15 → 30-45 M ops/s (1.4-2.0x target)
- realloc: 277ns → 140-210ns (1.3-2.0x target)
- **Documentation**:
- User guide: `docs/ACE_LEARNING_LAYER.md`
- Technical plan: `docs/ACE_LEARNING_LAYER_PLAN.md`
- Progress report: `ACE_PHASE1_PROGRESS.md`
- **Phase 1 Deliverables** (COMPLETE ✅):
- ✅ Metrics collection (`hakmem_ace_metrics.{c,h}`)
- ✅ UCB1 learning algorithm (`hakmem_ace_ucb1.{c,h}`)
- ✅ Dual-loop controller (`hakmem_ace_controller.{c,h}`)
- ✅ Dynamic TLS capacity adjustment
- ✅ Hot-path metrics integration (alloc/free tracking)
- ✅ A/B benchmark script (`scripts/bench_ace_ab.sh`)
- **Usage**:
- Enable: `HAKMEM_ACE_ENABLED=1 ./your_benchmark`
- Debug: `HAKMEM_ACE_ENABLED=1 HAKMEM_ACE_LOG_LEVEL=2 ./your_benchmark`
- A/B test: `./scripts/bench_ace_ab.sh`
- **Next**: Phase 2 - Extended benchmarking + learning convergence validation
Directory Structure (2025-11-01 Reorganization)
- **benchmarks/** - All benchmark-related files
- `src/` - Benchmark source code (tiny/mid/comprehensive/stress)
- `scripts/` - Benchmark scripts organized by category
- `results/` - Benchmark results (formerly bench_results/)
- `perf/` - Performance profiling data (formerly perf_data/)
- **tests/** - Test files (unit/integration/stress)
- **core/** - Core allocator implementation
- **docs/** - Documentation (benchmarks/, api/, guides/)
- **scripts/** - Development scripts (build/, apps/, maintenance/)
- **archive/** - Historical documents and analysis
Where to Read More
- **SlabHandle Box**: `docs/SLAB_HANDLE.md`ownership + remote drain + metadata のカプセル化)
- **Free Safety**: `docs/FREE_SAFETY.md`二重free/クラス不一致のFailFastとリング運用
- **Cleanup/Organization**: `CLEANUP_SUMMARY_2025_11_01.md` (latest)
- **Archive**: `archive/README.md` - Historical docs and analysis
- Bench mode: `BENCH_MODE.md`
- Env knobs: `ENV_VARS.md`
- Tiny hot microbench: `TINY_HOT_BENCH.md`
- Frontend/Backend split: `FRONTEND_BACKEND_PLAN.md`
- LD status/safety: `LD_PRELOAD_STATUS.md`
- Goals/Targets: `GOALS_2025_10_29.md`
- Latest results: `BENCH_RESULTS_2025_10_29.md` (today), `BENCH_RESULTS_2025_10_28.md` (yesterday)
- Mainline integration plan: `MAINLINE_INTEGRATION.md`
- FLINT Intelligence (events/adaptation): `FLINT_INTELLIGENCE.md`
Hako / MIR / FFI
- `HAKO_MIR_FFI_SPEC.md` — フロント型検証完結MIRは運ぶだけFFI機械的ローワリングの仕様
Notes
- LD mode: keep `HAKMEM_LD_SAFE=2` default for apps; prefer directlink for tuning.
- Ultra/Frontend are experimental; keep OFF by default and use scripts for A/B.