hakmem

Author	SHA1	Message	Date
Moe Charm (CI)	19056282b6	Phase 3 D2: Wrapper Env Cache - [DECISION: NO-GO] Target: Reduce wrapper_env_cfg() overhead in malloc/free hot path - Strategy: Cache wrapper env configuration pointer in TLS - Approach: Fast pointer cache (TLS caches const wrapper_env_cfg_t*) Implementation: - core/box/wrapper_env_cache_env_box.h: ENV gate (HAKMEM_WRAP_ENV_CACHE) - core/box/wrapper_env_cache_box.h: TLS cache layer (wrapper_env_cfg_fast) - core/box/hak_wrappers.inc.h: Integration into malloc/free hot paths - ENV gate: HAKMEM_WRAP_ENV_CACHE=0/1 (default OFF) A/B Test Results (Mixed, 10-run, 20M iters): - Baseline (D2=0): 46.52M ops/s (avg), 46.47M ops/s (median) - Optimized (D2=1): 45.85M ops/s (avg), 45.98M ops/s (median) - Improvement: avg -1.44%, median -1.05% (DECISION: NO-GO) Analysis: - Regression cause: TLS cache adds overhead (branch + TLS access) - wrapper_env_cfg() is already minimal (pointer return after simple check) - Adding TLS caching layer makes it worse, not better - Branch prediction penalty outweighs any potential savings Cumulative Phase 2-3: - B3: +2.89%, B4: +1.47%, C3: +2.20% - D1: +1.06% (opt-in), D2: -1.44% (NO-GO) - Total: ~7.2% (excluding D2) Decision: FREEZE as research box (default OFF, regression confirmed) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-13 22:03:27 +09:00
Moe Charm (CI)	acc64f2438	Phase ML1: Pool v1 memset 89.73% overhead 軽量化 (+15.34% improvement) ## Summary - ChatGPT により bench_profile.h の setenv segfault を修正（RTLD_NEXT 経由に切り替え） - core/box/pool_zero_mode_box.h 新設：ENV キャッシュ経由で ZERO_MODE を統一管理 - core/hakmem_pool.c で zero mode に応じた memset 制御（FULL/header/off） - A/B テスト結果：ZERO_MODE=header で +15.34% improvement（1M iterations, C6-heavy） ## Files Modified - core/box/pool_api.inc.h: pool_zero_mode_box.h include - core/bench_profile.h: glibc setenv → malloc+putenv（segfault 回避） - core/hakmem_pool.c: zero mode 参照・制御ロジック - core/box/pool_zero_mode_box.h (新設): enum/getter - CURRENT_TASK.md: Phase ML1 結果記載 ## Test Results \| Iterations \| ZERO_MODE=full \| ZERO_MODE=header \| Improvement \| \|-----------\|----------------\|-----------------\|------------\| \| 10K \| 3.06 M ops/s \| 3.17 M ops/s \| +3.65% \| \| 1M \| 23.71 M ops/s \| 27.34 M ops/s \| +15.34% \| 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-10 09:08:18 +09:00
Moe Charm (CI)	fda6cd2e67	Boxify superslab registry, add bench profile, and document C7 hotpath experiments	2025-12-07 03:12:27 +09:00
Moe Charm (CI)	bcfb4f6b59	Remove dead code: UltraHot, RingCache, FrontC23, Class5 Hotpath (cherry-picked from 225b6fcc7, conflicts resolved)	2025-11-26 12:33:49 +09:00
Moe Charm (CI)	fcf098857a	Phase12 debug: restore SUPERSLAB constants/APIs, implement Box2 drain boundary, fix tiny_fast_pop to return BASE, honor TLS SLL toggle in alloc/free fast paths, add fail-fast stubs, and quiet capacity sentinel. Update CURRENT_TASK with A/B results (SLL-off stable; SLL-on crash).	2025-11-14 01:02:00 +09:00
Moe Charm (CI)	72b38bc994	Phase E3-FINAL: Fix Box API offset bugs - ALL classes now use correct offsets ## Root Cause Analysis (GPT5) Physical Layout Constraints: - Class 0: 8B = [1B header][7B payload] → offset 1 = 9B needed = ❌ IMPOSSIBLE - Class 1-6: >=16B = [1B header][15B+ payload] → offset 1 = ✅ POSSIBLE - Class 7: 1KB → offset 0 (compatibility) Correct Specification: - HAKMEM_TINY_HEADER_CLASSIDX != 0: - Class 0, 7: next at offset 0 (overwrites header when on freelist) - Class 1-6: next at offset 1 (after header) - HAKMEM_TINY_HEADER_CLASSIDX == 0: - All classes: next at offset 0 Previous Bug: - Attempted "ALL classes offset 1" unification - Class 0 with offset 1 caused immediate SEGV (9B > 8B block size) - Mixed 2-arg/3-arg API caused confusion ## Fixes Applied ### 1. Restored 3-Argument Box API (core/box/tiny_next_ptr_box.h) ```c // Correct signatures void tiny_next_write(int class_idx, void* base, void* next_value) void* tiny_next_read(int class_idx, const void* base) // Correct offset calculation size_t offset = (class_idx == 0 \|\| class_idx == 7) ? 0 : 1; ``` ### 2. Updated 123+ Call Sites Across 34 Files - hakmem_tiny_hot_pop_v4.inc.h (4 locations) - hakmem_tiny_fastcache.inc.h (3 locations) - hakmem_tiny_tls_list.h (12 locations) - superslab_inline.h (5 locations) - tiny_fastcache.h (3 locations) - ptr_trace.h (macro definitions) - tls_sll_box.h (2 locations) - + 27 additional files Pattern: `tiny_next_read(base)` → `tiny_next_read(class_idx, base)` Pattern: `tiny_next_write(base, next)` → `tiny_next_write(class_idx, base, next)` ### 3. Added Sentinel Detection Guards - tiny_fast_push(): Block nodes with sentinel in ptr or ptr->next - tls_list_push(): Block nodes with sentinel in ptr or ptr->next - Defense-in-depth against remote free sentinel leakage ## Verification (GPT5 Report) Test Command: `./out/release/bench_random_mixed_hakmem --iterations=70000` Results: - ✅ Main loop completed successfully - ✅ Drain phase completed successfully - ✅ NO SEGV (previous crash at iteration 66151 is FIXED) - ℹ️ Final log: "tiny_alloc(1024) failed" is normal fallback to Mid/ACE layers Analysis: - Class 0 immediate SEGV: ✅ RESOLVED (correct offset 0 now used) - 66K iteration crash: ✅ RESOLVED (offset consistency fixed) - Box API conflicts: ✅ RESOLVED (unified 3-arg API) ## Technical Details ### Offset Logic Justification ``` Class 0: 8B block → next pointer (8B) fits ONLY at offset 0 Class 1: 16B block → next pointer (8B) fits at offset 1 (after 1B header) Class 2: 32B block → next pointer (8B) fits at offset 1 ... Class 6: 512B block → next pointer (8B) fits at offset 1 Class 7: 1024B block → offset 0 for legacy compatibility ``` ### Files Modified (Summary) - Core API: `box/tiny_next_ptr_box.h` - Hot paths: `hakmem_tiny_hot_pop.inc.h`, `tiny_fastcache.h` - TLS layers: `hakmem_tiny_tls_list.h`, `hakmem_tiny_tls_ops.h` - SuperSlab: `superslab_inline.h`, `tiny_superslab_.inc.h` - Refill: `hakmem_tiny_refill.inc.h`, `tiny_refill_opt.h` - Free paths: `tiny_free_magazine.inc.h`, `tiny_superslab_free.inc.h` - Documentation: Multiple Phase E3 reports ## Remaining Work None for Box API offset bugs - all structural issues resolved. Future enhancements (non-critical): - Periodic `grep -R '(void*)' core/` to detect direct pointer access violations - Enforce Box API usage via static analysis - Document offset rationale in architecture docs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-13 06:50:20 +09:00
Moe Charm (CI)	862e8ea7db	Infrastructure and build updates - Update build configuration and flags - Add missing header files and dependencies - Update TLS list implementation with proper scoping - Fix various compilation warnings and issues - Update debug ring and tiny allocation infrastructure - Update benchmark results documentation Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>	2025-11-11 21:49:05 +09:00
Moe Charm (CI)	1b6624dec4	Fix debug build: gate Tiny observation snapshot in hakmem_tiny_stats.c behind HAKMEM_TINY_OBS_ENABLE to avoid incomplete TinyObsStats and missing globals. Now debug build passes, enabling C7 triage with fail‑fast guards.	2025-11-10 03:00:00 +09:00
Moe Charm (CI)	d9b334b968	Tiny: Enable P0 batch refill by default + docs and task update Summary - Default P0 ON: Build-time HAKMEM_TINY_P0_BATCH_REFILL=1 remains; runtime gate now defaults to ON (HAKMEM_TINY_P0_ENABLE unset or not '0'). Kill switch preserved via HAKMEM_TINY_P0_DISABLE=1. - Fix critical bug: After freelist→SLL batch splice, increment TinySlabMeta::used by 'from_freelist' to mirror non-P0 behavior (prevents under-accounting and follow-on carve invariants from breaking). - Add low-overhead A/B toggles for triage: HAKMEM_TINY_P0_NO_DRAIN (skip remote drain), HAKMEM_TINY_P0_LOG (emit [P0_COUNTER_OK/MISMATCH] based on total_active_blocks delta). - Keep linear carve fail-fast guards across simple/general/TLS-bump paths. Perf (1T, 100k×256B) - P0 OFF: ~2.73M ops/s (stable) - P0 ON (no drain): ~2.45M ops/s - P0 ON (normal drain): ~2.76M ops/s (fastest) Known - Rare [P0_COUNTER_MISMATCH] warnings persist (non-fatal). Continue auditing active/used balance around batch freelist splice and remote drain splice. Docs - Add docs/TINY_P0_BATCH_REFILL.md (runtime switches, behavior, perf notes). - Update CURRENT_TASK.md with Tiny P0 status (default ON) and next steps.	2025-11-09 22:12:34 +09:00
Moe Charm (CI)	1010a961fb	Tiny: fix header/stride mismatch and harden refill paths - Root cause: header-based class indexing (HEADER_CLASSIDX=1) wrote a 1-byte header during allocation, but linear carve/refill and initial slab capacity still used bare class block sizes. This mismatch could overrun slab usable space and corrupt freelists, causing reproducible SEGV at ~100k iters. Changes - Superslab: compute capacity with effective stride (block_size + header for classes 0..6; class7 remains headerless) in superslab_init_slab(). Add a debug-only bound check in superslab_alloc_from_slab() to fail fast if carve would exceed usable bytes. - Refill (non-P0 and P0): use header-aware stride for all linear carving and TLS window bump operations. Ensure alignment/validation in tiny_refill_opt.h also uses stride, not raw class size. - Drain: keep existing defense-in-depth for remote sentinel and sanitize nodes before splicing into freelist (already present). Notes - This unifies the memory layout across alloc/linear-carve/refill with a single stride definition and keeps class7 (1024B) headerless as designed. - Debug builds add fail-fast checks; release builds remain lean. Next - Re-run Tiny benches (256/1024B) in debug to confirm stability, then in release. If any remaining crash persists, bisect with HAKMEM_TINY_P0_BATCH_REFILL=0 to isolate P0 batch carve, and continue reducing branch-miss as planned.	2025-11-09 18:55:50 +09:00

10 Commits