hakmem/docs/specs/DOCS_INDEX.md

HAKMEM Docs Index (2025-10-29)

Purpose
- One‑page map for current work: how to build, run, compare, and tune.
- Focus on Tiny fast‑path tuning vs system/mimalloc, with safe LD guidance.

Quick Build
- Direct link (recommended for perf tuning)
  - `make bench_fast`
  - Run: `HAKMEM_WRAP_TINY=1 ./bench_comprehensive_hakmem`
- PGO (direct link)
  - `./build_pgo.sh` (profile+build)
  - Run: `HAKMEM_WRAP_TINY=1 ./bench_comprehensive_hakmem`
- Shared (LD_PRELOAD) PGO
  - `make pgo-profile-shared && make pgo-build-shared`
  - Run: `HAKMEM_WRAP_TINY=1 LD_PRELOAD=./libhakmem.so ./bench_comprehensive_system`

Direct‑Link Comparisons (CSV)
- Pair (HAKMEM vs mimalloc): `bash scripts/run_comprehensive_pair.sh`
  - CSV: `bench_results/comp_pair_YYYYMMDD_HHMMSS/summary.csv`
- Tiny hot triad (HAKMEM/System/mimalloc): `bash scripts/run_tiny_hot_triad.sh 80000`
  - CSV: `bench_results/tiny_hot_triad_YYYYMMDD_HHMMSS/results.csv`
- Random mixed triad: `bash scripts/run_random_mixed_matrix.sh 120000`
  - CSV: `bench_results/random_mixed_YYYYMMDD_HHMMSS/results.csv`

Perf‑Main preset (safe, mainline‑oriented)
- Build + run triad: `bash scripts/run_perf_main_triad.sh 60000`
  - Applies recommended tiny env (TLS_SLL=1, REFILL_MAX=96, HOT=192, HYST=16) without bench‑only macros.

Tiny param sweeps
- Basic: `bash scripts/sweep_tiny_params.sh 100000`
- Advanced（SLL倍率/リフィル/クラス別MAGなど）: `bash scripts/sweep_tiny_advanced.sh 80000 --mag64-512`

LD_PRELOAD Apps (opt‑in)
- Script: `bash scripts/run_apps_with_hakmem.sh`
- Default safety: `HAKMEM_LD_SAFE=2` (pass‑through) set in script, then per‑case `LD_PRELOAD` on.
- Recommendation: use direct‑link for perf; LD runs are for stability sampling only.

Tiny Modes and Knobs
- Normal (default): TLS magazine + TLS SLL (≤256B)
  - `HAKMEM_TINY_TLS_SLL=1` (default)
  - `HAKMEM_TINY_MAG_CAP=128` (good tiny bench preset; 64B may prefer 512)
- TinyQuickSlot（最小フロント; 実験）
  - `HAKMEM_TINY_QUICK=1`
  - items[6] を1ラインに保持。miss時は SLL/Mag から少量補充して即返却。
- Ultra (SLL‑only, experimental):
  - `HAKMEM_TINY_ULTRA=1` (opt‑in)
  - `HAKMEM_TINY_ULTRA_VALIDATE=0/1` (perf vs safety)
  - Per‑class overrides: `HAKMEM_TINY_ULTRA_BATCH_C{0..7}`, `HAKMEM_TINY_ULTRA_SLL_CAP_C{0..7}`
- FLINT (Fast Lightweight INTelligence): Frontend + deferred Intelligence（実験）
  - `HAKMEM_TINY_FRONTEND=1` (enable array FastCache; miss falls back)
  - `HAKMEM_TINY_FASTCACHE=1` (low‑level switch; keep OFF unless A/B)
  - `HAKMEM_INT_ENGINE=1` (event ring + BG thread adjusts fill targets)
  - イベント拡張（内部）: timestamp/tier/flags/site_id/thread をリングに蓄積（ホットパス外）。今後の適応に活用

Best‑Known Presets (direct link)
- Tiny hot focus
  - `export HAKMEM_WRAP_TINY=1`
  - `export HAKMEM_TINY_TLS_SLL=1`
  - `export HAKMEM_TINY_MAG_CAP=128` (64B: try 512)
  - `export HAKMEM_TINY_REMOTE_DRAIN_TRYRATE=0`
  - `export HAKMEM_TINY_REMOTE_DRAIN_THRESHOLD=1000000`
- Memory efficiency A/B
  - `export HAKMEM_TINY_FLUSH_ON_EXIT=1`
  - Run bench/app; compare steady‑state RSS with/without.
  
Refill Batch (A/B)
- `HAKMEM_TINY_REFILL_MAX_HOT`（既定192）/ `HAKMEM_TINY_REFILL_MAX`（既定64）
- 小サイズ帯（8/16/32B）でピーク探索。現環境は既定付近が最良帯

Current Results (high level)
- Tiny hot triad (Perf‑Main, 60–80k cycles, safe):
  - 16–64B: System ≈ 300–335 M; HAKMEM ≈ 250–300 M; mimalloc 535–620 M.
  - 128B:   HAKMEM ≈ 250–270 M; System 170–176 M; mimalloc 575–586 M.
- Comprehensive (direct link): mimalloc ≈ 0.9–1.0B; HAKMEM ≈ 0.25–0.27B.
- Random mixed: three close; mimalloc slightly ahead; HAKMEM ≈ System ± a few %.

Bench‑only highlight（参考値, 専用ビルド）
- SLL‑only + warmup + PGO（≤64B）で 8–24B が 400M超、32B/b100 最大 429.18M（System 312.55M）。
  - 実行: `bash scripts/run_tiny_sllonly_triad.sh 30000`（安全な通常ビルドには含めません）

Open Focus
- Close the 16–64B gap (cap/batch tuning; SLL/mini‑mag overhead shave).
- Ultra (opt‑in) stabilization; A/B vs normal.
- Frontend refill heuristics; BG engine stop/join wiring (added).

Mid Range MT (8-32KB, mimalloc-style)
- **Status**: COMPLETE (2025-11-01) - 110M ops/sec achieved ✅
- Quick benchmark: `bash benchmarks/scripts/mid/run_mid_mt_bench.sh`
- Comparison: `bash benchmarks/scripts/mid/compare_mid_mt_allocators.sh`
- Full report: `MID_MT_COMPLETION_REPORT.md`
- Implementation: `core/hakmem_mid_mt.{c,h}`
- Results: 110M ops/sec (100-101% of mimalloc, 2.12x faster than glibc)

ACE Learning Layer (Adaptive Control Engine)
- **Status**: Phase 1 COMPLETE ✅ (2025-11-01) - Infrastructure ready 🚀
- **Goal**: Fix weaknesses with adaptive learning (mimalloc超えを目指す！)
  - Fragmentation stress: 3.87 → 10-20 M ops/s (2.6-5.2x target)
  - Large WS: 22.15 → 30-45 M ops/s (1.4-2.0x target)
  - realloc: 277ns → 140-210ns (1.3-2.0x target)
- **Documentation**:
  - User guide: `docs/ACE_LEARNING_LAYER.md` ✅
  - Technical plan: `docs/ACE_LEARNING_LAYER_PLAN.md` ✅
  - Progress report: `ACE_PHASE1_PROGRESS.md` ✅
- **Phase 1 Deliverables** (COMPLETE ✅):
  - ✅ Metrics collection (`hakmem_ace_metrics.{c,h}`)
  - ✅ UCB1 learning algorithm (`hakmem_ace_ucb1.{c,h}`)
  - ✅ Dual-loop controller (`hakmem_ace_controller.{c,h}`)
  - ✅ Dynamic TLS capacity adjustment
  - ✅ Hot-path metrics integration (alloc/free tracking)
  - ✅ A/B benchmark script (`scripts/bench_ace_ab.sh`)
- **Usage**:
  - Enable: `HAKMEM_ACE_ENABLED=1 ./your_benchmark`
  - Debug: `HAKMEM_ACE_ENABLED=1 HAKMEM_ACE_LOG_LEVEL=2 ./your_benchmark`
  - A/B test: `./scripts/bench_ace_ab.sh`
- **Next**: Phase 2 - Extended benchmarking + learning convergence validation

Directory Structure (2025-11-01 Reorganization)
- **benchmarks/** - All benchmark-related files
  - `src/` - Benchmark source code (tiny/mid/comprehensive/stress)
  - `scripts/` - Benchmark scripts organized by category
  - `results/` - Benchmark results (formerly bench_results/)
  - `perf/` - Performance profiling data (formerly perf_data/)
- **tests/** - Test files (unit/integration/stress)
- **core/** - Core allocator implementation
- **docs/** - Documentation (benchmarks/, api/, guides/)
- **scripts/** - Development scripts (build/, apps/, maintenance/)
- **archive/** - Historical documents and analysis

Where to Read More
- **SlabHandle Box**: `docs/SLAB_HANDLE.md`（ownership + remote drain + metadata のカプセル化）
- **Free Safety**: `docs/FREE_SAFETY.md`（二重free/クラス不一致のFail‑Fastとリング運用）
- **Cleanup/Organization**: `CLEANUP_SUMMARY_2025_11_01.md` (latest)
- **Archive**: `archive/README.md` - Historical docs and analysis
- Bench mode: `BENCH_MODE.md`
- Env knobs: `ENV_VARS.md`
- Tiny hot microbench: `TINY_HOT_BENCH.md`
- Frontend/Backend split: `FRONTEND_BACKEND_PLAN.md`
- LD status/safety: `LD_PRELOAD_STATUS.md`
- Goals/Targets: `GOALS_2025_10_29.md`
- Latest results: `BENCH_RESULTS_2025_10_29.md` (today), `BENCH_RESULTS_2025_10_28.md` (yesterday)
- Mainline integration plan: `MAINLINE_INTEGRATION.md`
- FLINT Intelligence (events/adaptation): `FLINT_INTELLIGENCE.md`
 
Hako / MIR / FFI
- `HAKO_MIR_FFI_SPEC.md` — フロント型検証完結＋MIRは運ぶだけ＋FFI機械的ローワリングの仕様

Notes
- LD mode: keep `HAKMEM_LD_SAFE=2` default for apps; prefer direct‑link for tuning.
- Ultra/Frontend are experimental; keep OFF by default and use scripts for A/B.
-												Debug Counters Implementation - Clean History

Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>

											
										
										
											2025-11-05 12:31:14 +09:00
+								HAKMEM Docs Index (2025-10-29)
 								Purpose
 								- One‑page map for current work: how to build, run, compare, and tune.
 								- Focus on Tiny fast‑path tuning vs system/mimalloc, with safe LD guidance.
 								Quick Build
 								- Direct link (recommended for perf tuning)
 								  - `make bench_fast`
 								  - Run: `HAKMEM_WRAP_TINY=1 ./bench_comprehensive_hakmem`
 								- PGO (direct link)
 								  - `./build_pgo.sh` (profile+build)
 								  - Run: `HAKMEM_WRAP_TINY=1 ./bench_comprehensive_hakmem`
 								- Shared (LD_PRELOAD) PGO
 								  - `make pgo-profile-shared && make pgo-build-shared`
 								  - Run: `HAKMEM_WRAP_TINY=1 LD_PRELOAD=./libhakmem.so ./bench_comprehensive_system`
 								Direct‑Link Comparisons (CSV)
 								- Pair (HAKMEM vs mimalloc): `bash scripts/run_comprehensive_pair.sh`
 								  - CSV: `bench_results/comp_pair_YYYYMMDD_HHMMSS/summary.csv`
 								- Tiny hot triad (HAKMEM/System/mimalloc): `bash scripts/run_tiny_hot_triad.sh 80000`
 								  - CSV: `bench_results/tiny_hot_triad_YYYYMMDD_HHMMSS/results.csv`
 								- Random mixed triad: `bash scripts/run_random_mixed_matrix.sh 120000`
 								  - CSV: `bench_results/random_mixed_YYYYMMDD_HHMMSS/results.csv`
 								Perf‑Main preset (safe, mainline‑oriented)
 								- Build + run triad: `bash scripts/run_perf_main_triad.sh 60000`
 								  - Applies recommended tiny env (TLS_SLL=1, REFILL_MAX=96, HOT=192, HYST=16) without bench‑only macros.
 								Tiny param sweeps
 								- Basic: `bash scripts/sweep_tiny_params.sh 100000`
 								- Advanced（SLL倍率/リフィル/クラス別MAGなど）: `bash scripts/sweep_tiny_advanced.sh 80000 --mag64-512`
 								LD_PRELOAD Apps (opt‑in)
 								- Script: `bash scripts/run_apps_with_hakmem.sh`
 								- Default safety: `HAKMEM_LD_SAFE=2` (pass‑through) set in script, then per‑case `LD_PRELOAD` on.
 								- Recommendation: use direct‑link for perf; LD runs are for stability sampling only.
 								Tiny Modes and Knobs
 								- Normal (default): TLS magazine + TLS SLL (≤256B)
 								  - `HAKMEM_TINY_TLS_SLL=1` (default)
 								  - `HAKMEM_TINY_MAG_CAP=128` (good tiny bench preset; 64B may prefer 512)
 								- TinyQuickSlot（最小フロント; 実験）
 								  - `HAKMEM_TINY_QUICK=1`
 								  - items[6] を1ラインに保持。miss時は SLL/Mag から少量補充して即返却。
 								- Ultra (SLL‑only, experimental):
 								  - `HAKMEM_TINY_ULTRA=1` (opt‑in)
 								  - `HAKMEM_TINY_ULTRA_VALIDATE=0/1` (perf vs safety)
 								  - Per‑class overrides: `HAKMEM_TINY_ULTRA_BATCH_C{0..7}`, `HAKMEM_TINY_ULTRA_SLL_CAP_C{0..7}`
 								- FLINT (Fast Lightweight INTelligence): Frontend + deferred Intelligence（実験）
 								  - `HAKMEM_TINY_FRONTEND=1` (enable array FastCache; miss falls back)
 								  - `HAKMEM_TINY_FASTCACHE=1` (low‑level switch; keep OFF unless A/B)
 								  - `HAKMEM_INT_ENGINE=1` (event ring + BG thread adjusts fill targets)
 								  - イベント拡張（内部）: timestamp/tier/flags/site_id/thread をリングに蓄積（ホットパス外）。今後の適応に活用
 								Best‑Known Presets (direct link)
 								- Tiny hot focus
 								  - `export HAKMEM_WRAP_TINY=1`
 								  - `export HAKMEM_TINY_TLS_SLL=1`
 								  - `export HAKMEM_TINY_MAG_CAP=128` (64B: try 512)
 								  - `export HAKMEM_TINY_REMOTE_DRAIN_TRYRATE=0`
 								  - `export HAKMEM_TINY_REMOTE_DRAIN_THRESHOLD=1000000`
 								- Memory efficiency A/B
 								  - `export HAKMEM_TINY_FLUSH_ON_EXIT=1`
 								  - Run bench/app; compare steady‑state RSS with/without.
 								Refill Batch (A/B)
 								- `HAKMEM_TINY_REFILL_MAX_HOT`（既定192）/ `HAKMEM_TINY_REFILL_MAX`（既定64）
 								- 小サイズ帯（8/16/32B）でピーク探索。現環境は既定付近が最良帯
 								Current Results (high level)
 								- Tiny hot triad (Perf‑Main, 60–80k cycles, safe):
 								  - 16–64B: System ≈ 300–335 M; HAKMEM ≈ 250–300 M; mimalloc 535–620 M.
 								  - 128B:   HAKMEM ≈ 250–270 M; System 170–176 M; mimalloc 575–586 M.
 								- Comprehensive (direct link): mimalloc ≈ 0.9–1.0B; HAKMEM ≈ 0.25–0.27B.
 								- Random mixed: three close; mimalloc slightly ahead; HAKMEM ≈ System ± a few %.
 								Bench‑only highlight（参考値, 専用ビルド）
 								- SLL‑only + warmup + PGO（≤64B）で 8–24B が 400M超、32B/b100 最大 429.18M（System 312.55M）。
 								  - 実行: `bash scripts/run_tiny_sllonly_triad.sh 30000`（安全な通常ビルドには含めません）
 								Open Focus
 								- Close the 16–64B gap (cap/batch tuning; SLL/mini‑mag overhead shave).
 								- Ultra (opt‑in) stabilization; A/B vs normal.
 								- Frontend refill heuristics; BG engine stop/join wiring (added).
 								Mid Range MT (8-32KB, mimalloc-style)
 								- **Status**: COMPLETE (2025-11-01) - 110M ops/sec achieved ✅
 								- Quick benchmark: `bash benchmarks/scripts/mid/run_mid_mt_bench.sh`
 								- Comparison: `bash benchmarks/scripts/mid/compare_mid_mt_allocators.sh`
 								- Full report: `MID_MT_COMPLETION_REPORT.md`
 								- Implementation: `core/hakmem_mid_mt.{c,h}`
 								- Results: 110M ops/sec (100-101% of mimalloc, 2.12x faster than glibc)
 								ACE Learning Layer (Adaptive Control Engine)
 								- **Status**: Phase 1 COMPLETE ✅ (2025-11-01) - Infrastructure ready 🚀
 								- **Goal**: Fix weaknesses with adaptive learning (mimalloc超えを目指す！)
 								  - Fragmentation stress: 3.87 → 10-20 M ops/s (2.6-5.2x target)
 								  - Large WS: 22.15 → 30-45 M ops/s (1.4-2.0x target)
 								  - realloc: 277ns → 140-210ns (1.3-2.0x target)
 								- **Documentation**:
 								  - User guide: `docs/ACE_LEARNING_LAYER.md` ✅
 								  - Technical plan: `docs/ACE_LEARNING_LAYER_PLAN.md` ✅
 								  - Progress report: `ACE_PHASE1_PROGRESS.md` ✅
 								- **Phase 1 Deliverables** (COMPLETE ✅):
 								  - ✅ Metrics collection (`hakmem_ace_metrics.{c,h}`)
 								  - ✅ UCB1 learning algorithm (`hakmem_ace_ucb1.{c,h}`)
 								  - ✅ Dual-loop controller (`hakmem_ace_controller.{c,h}`)
 								  - ✅ Dynamic TLS capacity adjustment
 								  - ✅ Hot-path metrics integration (alloc/free tracking)
 								  - ✅ A/B benchmark script (`scripts/bench_ace_ab.sh`)
 								- **Usage**:
 								  - Enable: `HAKMEM_ACE_ENABLED=1 ./your_benchmark`
 								  - Debug: `HAKMEM_ACE_ENABLED=1 HAKMEM_ACE_LOG_LEVEL=2 ./your_benchmark`
 								  - A/B test: `./scripts/bench_ace_ab.sh`
 								- **Next**: Phase 2 - Extended benchmarking + learning convergence validation
 								Directory Structure (2025-11-01 Reorganization)
 								- **benchmarks/** - All benchmark-related files
 								  - `src/` - Benchmark source code (tiny/mid/comprehensive/stress)
 								  - `scripts/` - Benchmark scripts organized by category
 								  - `results/` - Benchmark results (formerly bench_results/)
 								  - `perf/` - Performance profiling data (formerly perf_data/)
 								- **tests/** - Test files (unit/integration/stress)
 								- **core/** - Core allocator implementation
 								- **docs/** - Documentation (benchmarks/, api/, guides/)
 								- **scripts/** - Development scripts (build/, apps/, maintenance/)
 								- **archive/** - Historical documents and analysis
 								Where to Read More
 								- **SlabHandle Box**: `docs/SLAB_HANDLE.md`（ownership + remote drain + metadata のカプセル化）
 								- **Free Safety**: `docs/FREE_SAFETY.md`（二重free/クラス不一致のFail‑Fastとリング運用）
 								- **Cleanup/Organization**: `CLEANUP_SUMMARY_2025_11_01.md` (latest)
 								- **Archive**: `archive/README.md` - Historical docs and analysis
 								- Bench mode: `BENCH_MODE.md`
 								- Env knobs: `ENV_VARS.md`
 								- Tiny hot microbench: `TINY_HOT_BENCH.md`
 								- Frontend/Backend split: `FRONTEND_BACKEND_PLAN.md`
 								- LD status/safety: `LD_PRELOAD_STATUS.md`
 								- Goals/Targets: `GOALS_2025_10_29.md`
 								- Latest results: `BENCH_RESULTS_2025_10_29.md` (today), `BENCH_RESULTS_2025_10_28.md` (yesterday)
 								- Mainline integration plan: `MAINLINE_INTEGRATION.md`
 								- FLINT Intelligence (events/adaptation): `FLINT_INTELLIGENCE.md`
-												Phase 6-2.4: Fix SuperSlab free SEGV: remove guess loop and add memory readability check; add registry atomic consistency (base as _Atomic uintptr_t with acq/rel); add debug toggles (SUPER_REG_DEBUG/REQTRACE); update CURRENT_TASK with results and next steps; capture suite results.

											
										
										
											2025-11-07 18:07:48 +09:00
 								Hako / MIR / FFI
 								- `HAKO_MIR_FFI_SPEC.md` — フロント型検証完結＋MIRは運ぶだけ＋FFI機械的ローワリングの仕様
-												Debug Counters Implementation - Clean History

Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>

											
										
										
											2025-11-05 12:31:14 +09:00
 								Notes
 								- LD mode: keep `HAKMEM_LD_SAFE=2` default for apps; prefer direct‑link for tuning.
 								- Ultra/Frontend are experimental; keep OFF by default and use scripts for A/B.