Files
hakmem/docs/specs/DOCS_INDEX.md
Moe Charm (CI) 0546454168 WIP: Add TLS SLL validation and SuperSlab registry fallback
ChatGPT's diagnostic changes to address TLS_SLL_HDR_RESET issue.
Current status: Partial mitigation, but root cause remains.

Changes Applied:
1. SuperSlab Registry Fallback (hakmem_super_registry.h)
   - Added legacy table probe when hash map lookup misses
   - Prevents NULL returns for valid SuperSlabs during initialization
   - Status:  Works but may hide underlying registration issues

2. TLS SLL Push Validation (tls_sll_box.h)
   - Reject push if SuperSlab lookup returns NULL
   - Reject push if class_idx mismatch detected
   - Added [TLS_SLL_PUSH_NO_SS] diagnostic message
   - Status:  Prevents list corruption (defensive)

3. SuperSlab Allocation Class Fix (superslab_allocate.c)
   - Pass actual class_idx to sp_internal_allocate_superslab
   - Prevents dummy class=8 causing OOB access
   - Status:  Root cause fix for allocation path

4. Debug Output Additions
   - First 256 push/pop operations traced
   - First 4 mismatches logged with details
   - SuperSlab registration state logged
   - Status:  Diagnostic tool (not a fix)

5. TLS Hint Box Removed
   - Deleted ss_tls_hint_box.{c,h} (Phase 1 optimization)
   - Simplified to focus on stability first
   - Status:  Can be re-added after root cause fixed

Current Problem (REMAINS UNSOLVED):
- [TLS_SLL_HDR_RESET] still occurs after ~60 seconds of sh8bench
- Pointer is 16 bytes offset from expected (class 1 → class 2 boundary)
- hak_super_lookup returns NULL for that pointer
- Suggests: Use-After-Free, Double-Free, or pointer arithmetic error

Root Cause Analysis:
- Pattern: Pointer offset by +16 (one class 1 stride)
- Timing: Cumulative problem (appears after 60s, not immediately)
- Location: Header corruption detected during TLS SLL pop

Remaining Issues:
⚠️ Registry fallback is defensive (may hide registration bugs)
⚠️ Push validation prevents symptoms but not root cause
⚠️ 16-byte pointer offset source unidentified

Next Steps for Investigation:
1. Full pointer arithmetic audit (Magazine ⇔ TLS SLL paths)
2. Enhanced logging at HDR_RESET point:
   - Expected vs actual pointer value
   - Pointer provenance (where it came from)
   - Allocation trace for that block
3. Verify Headerless flag is OFF throughout build
4. Check for double-offset application in conversions

Technical Assessment:
- 60% root cause fixes (allocation class, validation)
- 40% defensive mitigation (registry fallback, push rejection)

Performance Impact:
- Registry fallback: +10-30 cycles on cold path (negligible)
- Push validation: +5-10 cycles per push (acceptable)
- Overall: < 2% performance impact estimated

Related Issues:
- Phase 1 TLS Hint Box removed temporarily
- Phase 2 Headerless blocked until stability achieved

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 20:42:28 +09:00

153 lines
7.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

HAKMEM Docs Index (2025-10-29)
Purpose
- Onepage map for current work: how to build, run, compare, and tune.
- Focus on Tiny fastpath tuning vs system/mimalloc, with safe LD guidance.
Quick Build
- Direct link (recommended for perf tuning)
- `make bench_fast`
- Run: `HAKMEM_WRAP_TINY=1 ./bench_comprehensive_hakmem`
- PGO (direct link)
- `./build_pgo.sh` (profile+build)
- Run: `HAKMEM_WRAP_TINY=1 ./bench_comprehensive_hakmem`
- Shared (LD_PRELOAD) PGO
- `make pgo-profile-shared && make pgo-build-shared`
- Run: `HAKMEM_WRAP_TINY=1 LD_PRELOAD=./libhakmem.so ./bench_comprehensive_system`
DirectLink Comparisons (CSV)
- Pair (HAKMEM vs mimalloc): `bash scripts/run_comprehensive_pair.sh`
- CSV: `bench_results/comp_pair_YYYYMMDD_HHMMSS/summary.csv`
- Tiny hot triad (HAKMEM/System/mimalloc): `bash scripts/run_tiny_hot_triad.sh 80000`
- CSV: `bench_results/tiny_hot_triad_YYYYMMDD_HHMMSS/results.csv`
- Random mixed triad: `bash scripts/run_random_mixed_matrix.sh 120000`
- CSV: `bench_results/random_mixed_YYYYMMDD_HHMMSS/results.csv`
PerfMain preset (safe, mainlineoriented)
- Build + run triad: `bash scripts/run_perf_main_triad.sh 60000`
- Applies recommended tiny env (TLS_SLL=1, REFILL_MAX=96, HOT=192, HYST=16) without benchonly macros.
Tiny param sweeps
- Basic: `bash scripts/sweep_tiny_params.sh 100000`
- AdvancedSLL倍率/リフィル/クラス別MAGなど: `bash scripts/sweep_tiny_advanced.sh 80000 --mag64-512`
LD_PRELOAD Apps (optin)
- Script: `bash scripts/run_apps_with_hakmem.sh`
- Default safety: `HAKMEM_LD_SAFE=2` (passthrough) set in script, then percase `LD_PRELOAD` on.
- Recommendation: use directlink for perf; LD runs are for stability sampling only.
Tiny Modes and Knobs
- Normal (default): TLS magazine + TLS SLL (≤256B)
- `HAKMEM_TINY_TLS_SLL=1` (default)
- `HAKMEM_TINY_MAG_CAP=128` (good tiny bench preset; 64B may prefer 512)
- TinyQuickSlot最小フロント; 実験)
- `HAKMEM_TINY_QUICK=1`
- items[6] を1ラインに保持。miss時は SLL/Mag から少量補充して即返却。
- Ultra (SLLonly, experimental):
- `HAKMEM_TINY_ULTRA=1` (optin)
- `HAKMEM_TINY_ULTRA_VALIDATE=0/1` (perf vs safety)
- Perclass overrides: `HAKMEM_TINY_ULTRA_BATCH_C{0..7}`, `HAKMEM_TINY_ULTRA_SLL_CAP_C{0..7}`
- FLINT (Fast Lightweight INTelligence): Frontend + deferred Intelligence実験
- `HAKMEM_TINY_FRONTEND=1` (enable array FastCache; miss falls back)
- `HAKMEM_TINY_FASTCACHE=1` (lowlevel switch; keep OFF unless A/B)
- `HAKMEM_INT_ENGINE=1` (event ring + BG thread adjusts fill targets)
- イベント拡張(内部): timestamp/tier/flags/site_id/thread をリングに蓄積(ホットパス外)。今後の適応に活用
BestKnown Presets (direct link)
- Tiny hot focus
- `export HAKMEM_WRAP_TINY=1`
- `export HAKMEM_TINY_TLS_SLL=1`
- `export HAKMEM_TINY_MAG_CAP=128` (64B: try 512)
- `export HAKMEM_TINY_REMOTE_DRAIN_TRYRATE=0`
- `export HAKMEM_TINY_REMOTE_DRAIN_THRESHOLD=1000000`
- Memory efficiency A/B
- `export HAKMEM_TINY_FLUSH_ON_EXIT=1`
- Run bench/app; compare steadystate RSS with/without.
Refill Batch (A/B)
- `HAKMEM_TINY_REFILL_MAX_HOT`既定192/ `HAKMEM_TINY_REFILL_MAX`既定64
- 小サイズ帯8/16/32Bでピーク探索。現環境は既定付近が最良帯
Current Results (high level)
- Tiny hot triad (PerfMain, 6080k cycles, safe):
- 1664B: System ≈ 300335 M; HAKMEM ≈ 250300 M; mimalloc 535620 M.
- 128B: HAKMEM ≈ 250270 M; System 170176 M; mimalloc 575586 M.
- Comprehensive (direct link): mimalloc ≈ 0.91.0B; HAKMEM ≈ 0.250.27B.
- Random mixed: three close; mimalloc slightly ahead; HAKMEM ≈ System ± a few %.
Benchonly highlight参考値, 専用ビルド)
- SLLonly + warmup + PGO≤64Bで 824B が 400M超、32B/b100 最大 429.18MSystem 312.55M)。
- 実行: `bash scripts/run_tiny_sllonly_triad.sh 30000`(安全な通常ビルドには含めません)
Open Focus
- Close the 1664B gap (cap/batch tuning; SLL/minimag overhead shave).
- Ultra (optin) stabilization; A/B vs normal.
- Frontend refill heuristics; BG engine stop/join wiring (added).
Mid Range MT (8-32KB, mimalloc-style)
- **Status**: COMPLETE (2025-11-01) - 110M ops/sec achieved ✅
- Quick benchmark: `bash benchmarks/scripts/mid/run_mid_mt_bench.sh`
- Comparison: `bash benchmarks/scripts/mid/compare_mid_mt_allocators.sh`
- Full report: `MID_MT_COMPLETION_REPORT.md`
- Implementation: `core/hakmem_mid_mt.{c,h}`
- Results: 110M ops/sec (100-101% of mimalloc, 2.12x faster than glibc)
ACE Learning Layer (Adaptive Control Engine)
- **Status**: Phase 1 COMPLETE ✅ (2025-11-01) - Infrastructure ready 🚀
- **Goal**: Fix weaknesses with adaptive learning (mimalloc超えを目指す)
- Fragmentation stress: 3.87 → 10-20 M ops/s (2.6-5.2x target)
- Large WS: 22.15 → 30-45 M ops/s (1.4-2.0x target)
- realloc: 277ns → 140-210ns (1.3-2.0x target)
- **Documentation**:
- User guide: `docs/ACE_LEARNING_LAYER.md`
- Technical plan: `docs/ACE_LEARNING_LAYER_PLAN.md`
- Progress report: `ACE_PHASE1_PROGRESS.md`
- Learning layer overview: `docs/analysis/LEARNING_LAYER_OVERVIEW.md`
- Learning A/B results: `docs/benchmarks/LEARNING_AB_RESULTS.md` (growing log)
- **Phase 1 Deliverables** (COMPLETE ✅):
- ✅ Metrics collection (`hakmem_ace_metrics.{c,h}`)
- ✅ UCB1 learning algorithm (`hakmem_ace_ucb1.{c,h}`)
- ✅ Dual-loop controller (`hakmem_ace_controller.{c,h}`)
- ✅ Dynamic TLS capacity adjustment
- ✅ Hot-path metrics integration (alloc/free tracking)
- ✅ A/B benchmark script (`scripts/bench_ace_ab.sh`)
- **Usage**:
- Enable: `HAKMEM_ACE_ENABLED=1 ./your_benchmark`
- Debug: `HAKMEM_ACE_ENABLED=1 HAKMEM_ACE_LOG_LEVEL=2 ./your_benchmark`
- A/B test: `./scripts/bench_ace_ab.sh`
- **Next**: Phase 2 - Extended benchmarking + learning convergence validation
Directory Structure (2025-11-01 Reorganization)
- **benchmarks/** - All benchmark-related files
- `src/` - Benchmark source code (tiny/mid/comprehensive/stress)
- `scripts/` - Benchmark scripts organized by category
- `results/` - Benchmark results (formerly bench_results/)
- `perf/` - Performance profiling data (formerly perf_data/)
- **tests/** - Test files (unit/integration/stress)
- **core/** - Core allocator implementation
- **docs/** - Documentation (benchmarks/, api/, guides/)
- **scripts/** - Development scripts (build/, apps/, maintenance/)
- **archive/** - Historical documents and analysis
Where to Read More
- **SlabHandle Box**: `docs/SLAB_HANDLE.md`ownership + remote drain + metadata のカプセル化)
- **Free Safety**: `docs/FREE_SAFETY.md`二重free/クラス不一致のFailFastとリング運用
- **Cleanup/Organization**: `CLEANUP_SUMMARY_2025_11_01.md` (latest)
- **Archive**: `archive/README.md` - Historical docs and analysis
- Bench mode: `BENCH_MODE.md`
- Env knobs: `ENV_VARS.md`
- Tiny hot microbench: `TINY_HOT_BENCH.md`
- Frontend/Backend split: `FRONTEND_BACKEND_PLAN.md`
- LD status/safety: `LD_PRELOAD_STATUS.md`
- Goals/Targets: `GOALS_2025_10_29.md`
- Latest results: `BENCH_RESULTS_2025_10_29.md` (today), `BENCH_RESULTS_2025_10_28.md` (yesterday)
- Mainline integration plan: `MAINLINE_INTEGRATION.md`
- FLINT Intelligence (events/adaptation): `FLINT_INTELLIGENCE.md`
Hako / MIR / FFI
- `HAKO_MIR_FFI_SPEC.md` — フロント型検証完結MIRは運ぶだけFFI機械的ローワリングの仕様
Notes
- LD mode: keep `HAKMEM_LD_SAFE=2` default for apps; prefer directlink for tuning.
- Ultra/Frontend are experimental; keep OFF by default and use scripts for A/B.