hakmem

Author	SHA1	Message	Date
Moe Charm (CI)	e2ca52d59d	Phase v6-6: Inline hot path optimization for SmallObject Core v6 Optimize v6 alloc/free by eliminating redundant route checks and adding inline hot path functions: - smallobject_core_v6_box.h: Add inline hot path functions: - small_alloc_c6_hot_v6() / small_alloc_c5_hot_v6(): Direct TLS pop - small_free_c6_hot_v6() / small_free_c5_hot_v6(): Direct TLS push - No route check needed (caller already validated via switch case) - smallobject_core_v6.c: Add cold path functions: - small_alloc_cold_v6(): Handle TLS refill from page - small_free_cold_v6(): Handle page freelist push (TLS full/cross-thread) - malloc_tiny_fast.h: Update front gate to use inline hot path: - Alloc: hot path first, cold path fallback on TLS miss - Free: hot path first, cold path fallback on TLS full Performance results: - C5-heavy: v6 ON 42.2M ≈ baseline (parity restored) - C6-heavy: v6 ON 34.5M ≈ baseline (parity restored) - Mixed 16-1024B: ~26.5M (v3-only: ~28.1M, gap is routing overhead) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-11 15:59:29 +09:00
Moe Charm (CI)	1e04debb1b	Phase v6-5: C5 extension for SmallObject Core v6 Extend v6 architecture to support C5 (129-256B) in addition to C6 (257-512B): - SmallHeapCtxV6: Add tls_freelist_c5[32] and tls_count_c5 for C5 TLS cache - smallsegment_v6_box.h: Add SMALL_V6_C5_CLASS_IDX (5) and C5_BLOCK_SIZE (256) - smallobject_cold_iface_v6.c: Generalize refill_page for both C5 (256 blocks/page) and C6 (128 blocks/page) - smallobject_core_v6.c: Add C5 fast path (alloc/free) with TLS batching Performance (v6 C5 enabled): - C5-heavy: 41.0M ops/s (-23% vs v6 OFF 53.6M) - needs optimization - Mixed: 36.2M ops/s (-18% vs v6 OFF 44.0M) - functional baseline Note: C5 route requires optimization in next phase to match v6-3 performance. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-11 15:50:14 +09:00
Moe Charm (CI)	c60199182e	Phase v6-1/2/3/4: SmallObject Core v6 - C6-only implementation + refactor Phase v6-1: C6-only route stub (v1/pool fallback) Phase v6-2: Segment v6 + ColdIface v6 + Core v6 HotPath implementation - 2MiB segment / 64KiB page allocation - O(1) ptr→page_meta lookup with segment masking - C6-heavy A/B: SEGV-free but -44% performance (15.3M ops/s) Phase v6-3: Thin-layer optimization (TLS ownership check + batch header + refill batching) - TLS ownership fast-path skip page_meta for 90%+ of frees - Batch header writes during refill (32 allocs = 1 header write) - TLS batch refill (1/32 refill frequency) - C6-heavy A/B: v6-2 15.3M → v6-3 27.1M ops/s (±0% vs baseline) ✅ Phase v6-4: Mixed hang fix (segment metadata lookup correction) - Root cause: metadata lookup was reading mmap region instead of TLS slot - Fix: use TLS slot descriptor with in_use validation - Mixed health: 5M iterations SEGV-free, 35.8M ops/s ✅ Phase v6-refactor: Code quality improvements (macro unification + inline + docs) - Add SMALL_V6_* prefix macros (header, pointer conversion, page index) - Extract inline validation functions (small_page_v6_valid, small_ptr_in_segment_v6) - Doxygen-style comments for all public functions - Result: 0 compiler warnings, maintained +1.2% performance Files: - core/box/smallobject_core_v6_box.h (new, type & API definitions) - core/box/smallobject_cold_iface_v6.h (new, cold iface API) - core/box/smallsegment_v6_box.h (new, segment type definitions) - core/smallobject_core_v6.c (new, C6 alloc/free implementation) - core/smallobject_cold_iface_v6.c (new, refill/retire logic) - core/smallsegment_v6.c (new, segment allocator) - docs/analysis/SMALLOBJECT_CORE_V6_DESIGN.md (new, design document) - core/box/tiny_route_env_box.h (modified, v6 route added) - core/front/malloc_tiny_fast.h (modified, v6 case in route switch) - Makefile (modified, v6 objects added) - CURRENT_TASK.md (modified, v6 status added) Status: - C6-heavy: v6 OFF 27.1M → v6-3 ON 27.1M ops/s (±0%) ✅ - Mixed: v6 ON 35.8M ops/s (C6-only, other classes via v1) ✅ - Build: 0 warnings, fully documented ✅ 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-11 15:29:59 +09:00
Moe Charm (CI)	8789542a9f	Phase v5-7: C6 ULTRA pattern (research mode, 32-slot TLS freelist) Implementation: - ENV: HAKMEM_SMALL_HEAP_V5_ULTRA_C6_ENABLED=0\|1 (default: 0) - SmallHeapCtxV5: added c6_tls_freelist[32], c6_tls_count, ultra_c6_enabled - small_segment_v5_owns_ptr_fast(): lightweight segment check for free path - small_alloc_slow_v5_c6_refill(): batch TLS fill from page freelist - small_free_slow_v5_c6_drain(): drain half of TLS to page on overflow Performance (C6-heavy 257-768B, 2M iters, ws=400): - v5 OFF baseline: 47M ops/s - v5 ULTRA: 37-38M ops/s (-20%) - vs v5 base (no opts): +3-5% improvement Design limitation identified: - Header write required on every alloc (freelist overwrites header byte) - Segment validation required on every free - page->used tracking required for retirement - These prevent matching baseline pool v1 performance 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-11 13:32:46 +09:00
Moe Charm (CI)	f191774c1e	Phase v5-6: TLS batching for C6 v5 - Add HAKMEM_SMALL_HEAP_V5_BATCH_ENABLED ENV gate (default: 0) - Add SmallV5Batch struct with 4-slot buffer in SmallHeapCtxV5 - Integrate batch alloc/free paths (after cache, before freelist) - Fix pre-existing build error in tiny_free_magazine.inc.h (ss_time/tss undeclared) Benchmarks (C6 257-768B): - Batch OFF: 36.71M ops/s → Batch ON: 37.78M ops/s (+2.9%) - Mixed 16-1024B: batch ON 37.09M vs OFF 38.25M (-3%, within noise) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-11 12:53:03 +09:00
Moe Charm (CI)	2f5d53fd6d	Phase v5-5: TLS cache for C6 v5 Add 1-slot TLS cache to C6 v5 to reduce page_meta access overhead. Implementation: - Add HAKMEM_SMALL_HEAP_V5_TLS_CACHE_ENABLED ENV (default: 0) - SmallHeapCtxV5: add c6_cached_block field for TLS cache - alloc: cache hit bypasses page_meta lookup, returns immediately - free: empty cache stores block, full cache evicts old block first Results (1M iter, ws=400, HEADER_MODE=full): - C6-heavy (257-768B): 35.53M → 37.02M ops/s (+4.2%) - Mixed 16-1024B: 38.04M → 37.93M ops/s (-0.3%, noise) Known issue: header_mode=light has infinite loop bug (freelist pointer/header collision). Full mode only for now. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-11 07:40:22 +09:00
Moe Charm (CI)	2a548875b8	Phase v5-4: Header light mode & freelist optimization Implements header write optimization for C6 v5 allocator by moving header initialization from per-alloc time to carve time (during page refill). This eliminates redundant header writes on the hot path. Implementation: - Added HAKMEM_SMALL_HEAP_V5_HEADER_MODE ENV (full\|light, default: full) - Added header_mode field to SmallHeapCtxV5 (cached per-thread) - Modified alloc fast/slow paths to skip header write in light mode - Modified refill to write headers during carve in light mode - Free path unchanged (header validation still works) Benchmark Results (2M iterations, ws=400): C6-HEAVY (257-768B): - Baseline (v5 OFF): 47.95 Mops/s - v5 full mode: 38.97 Mops/s (-18.7% vs baseline) - v5 light mode: 39.25 Mops/s (-18.1% vs baseline, +0.7% vs full) MIXED 16-1024B: - v5 OFF: 43.59 Mops/s - v5 full mode: 36.53 Mops/s (-16.2% vs OFF) - v5 light mode: 38.04 Mops/s (-12.7% vs OFF, +4.1% vs full) Analysis: - Light mode shows modest improvement over full (+0.7-4.1%) - C6 v5 performance gap vs baseline (-18%) indicates need for further optimization beyond header writes - Mixed workload benefits more from light mode (+4.1% vs full) - No regressions in safety/correctness observed Research findings: - Header write optimization alone insufficient to close v5 gap - Need to investigate other hot path costs (freelist ops, metadata access) - Light mode validates the carve-time header concept 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-11 05:12:39 +09:00
Moe Charm (CI)	7b5ee8cee2	Phase v5-3: O(1) path optimization for C6-only v5 - Single TLS segment (eliminates slot search loop) - O(1) page_meta_of() (direct segment range check, no iteration) - __builtin_ctz for O(1) free page finding in bitmap - Simplified free path using page_meta_of() only (no find_page) - Partial limit 1 (minimal list traversal) Performance: - Before (v5-2): 14.7M ops/s - After (v5-3): 38.5M ops/s (+162%) - vs baseline: 44.9M ops/s (-14%) - SEGV: None, stable at ws=800 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-11 04:33:16 +09:00
Moe Charm (CI)	4c2869397f	Phase v5-3: SmallObject v5 定数・マクロ箱化リファクタリング改善内容: - 定数を box.h に統一 (C6_CLASS_IDX, BLOCK_SIZE, PARTIAL_LIMIT) - List helper をマクロ化 (SMALL_PAGE_V5_PUSH_PARTIAL等) - 重複関数 (page_push_partial等) を削除 - page_loc_t enum を box.h に移動効果: - hotbox_v5.c: 339行 → 263行 (76行削減) - コード重複排除 (マクロで管理) - 将来の拡張性向上 - 型安全性維持 (GCC statement expressions使用) テスト: - ビルド成功 - v5 OFF/ON 両方で動作確認 - 性能変化なし (リファクタリングのみ) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-11 04:24:20 +09:00
Moe Charm (CI)	e0fb7d550a	Phase v5-2: SmallObject v5 C6-only 本実装 (WIP - header fix) 本実装修正: - tiny_region_id_write_header() を追加: USER pointer を正しく返す - TLS slot からの segment 探索 (page_meta_of) - Page-level allocation で segment 再利用 - 2MiB alignment 保証 (4MiB 確保 + alignment) - free パスの route 修正 (v4 から v5 への fallthrough 削除) 動作確認: - SEGV 消失: alloc/free 基本動作 OK - 性能: ~18-20M ops/s (baseline 43-47M の約 40-45%) - 回帰原因: TLS slot 線形探索 O(n)、find_page O(n) 残タスク: - O(1) segment lookup 最適化 (hash または array 直接参照) - find_page 除去 (segment lookup 成功時) - partial_count/list 管理の最適化 ENV デフォルト OFF なので本線影響なし。 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-11 04:14:51 +09:00
Moe Charm (CI)	9c24bebf08	Phase v5-1: SmallObject v5 C6-only route stub 接続 - tiny_route_env_box.h: TINY_ROUTE_SMALL_HEAP_V5 enum 追加、route snapshot で C6→v5 分岐 - malloc_tiny_fast.h: alloc/free switch に v5 case 追加（v1/pool fallback） - smallobject_hotbox_v5.c: stub 実装（alloc は NULL 返却、free は no-op） - smallobject_hotbox_v5_box.h: 関数 signature に ctx パラメータ追加 - Makefile: core/smallobject_hotbox_v5.o をリンクリストに追加 - ENV_PROFILE_PRESETS.md: v5-1 プリセット追記 - CURRENT_TASK.md: Phase v5-1 完了記録特性: - ENV: HAKMEM_SMALL_HEAP_V5_ENABLED=1 / HAKMEM_SMALL_HEAP_V5_CLASSES=0x40 で opt-in - テスト結果: C6-heavy (v5 OFF 15.5M → v5 ON 16.4M ops/s, 正常), Mixed 47.2M ops/s, SEGV/assert なし - 挙動は v1/pool fallback と同じ（実装は v5-2） 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-11 03:25:37 +09:00
Moe Charm (CI)	dedfea27d5	Phase v5-0 refactor: ENV統一・マクロ化・構造体最適化 - ENV initialization を sentinel パターンで統一 - ENV_UNINIT/ENABLED/DISABLED 定数追加 - __builtin_expect で初期化チェックを最適化 - small_heap_v5_enabled/class_mask を統一パターンに変更 - ポインタマクロ化（O(1) segment/page 計算） - SMALL_SEGMENT_V5_BASE_FROM_PTR: ptr から segment base を mask で計算 - SMALL_SEGMENT_V5_PAGE_IDX: segment 内の page_idx を shift で計算 - SMALL_SEGMENT_V5_PAGE_META: page_meta への O(1) access（bounds check付き） - SMALL_SEGMENT_V5_VALIDATE_MAGIC: magic 検証 - SMALL_SEGMENT_V5_VALIDATE_PTR: Fail-Fast validation pipeline - SmallClassHeapV5 に partial_count 追加 - partial ページリストのカウンタを追加（refill/retire 最適化用） - SmallPageMetaV5 の field 再配置（L1 cache 最適化） - hot fields (free_list, used, capacity) を先頭に集約 - metadata (class_idx, flags, page_idx, segment) を後方配置 - total 24B、offset コメント追加 - route priority ENV 追加 - HAKMEM_ROUTE_PRIORITY={v4\|v5\|auto}（default: v4） - enum small_route_priority 定義 - small_route_priority() 関数追加 - segment_size override ENV 追加 - HAKMEM_SMALL_HEAP_V5_SEGMENT_SIZE（default: 2MiB） - power of 2 & >= 64KiB validation 挙動: 完全不変（v5 route は呼ばれない、ENV default OFF）テスト: Mixed 16–1024B で 43.0–43.8M ops/s（変化なし）、SEGV/assert なし 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-11 03:19:18 +09:00
Moe Charm (CI)	83d4096fbc	Phase v5-0: SmallObject v5 の設計・型/IF/ENV スケルトン追加設計ドキュメント: - docs/analysis/SMALLOBJECT_V5_DESIGN.md: v5 アーキテクチャ全体設計新規ファイル (v5 スケルトン): - core/box/smallobject_hotbox_v5_box.h: HotBox v5 型定義 - core/box/smallsegment_v5_box.h: Segment v5 型定義 - core/box/smallobject_cold_iface_v5.h: ColdIface v5 IF宣言 - core/box/smallobject_v5_env_box.h: ENV ゲート - core/smallobject_hotbox_v5.c: 実装 stub (完全 fallback) 特徴: ✅ 型とインターフェースのみ定義（v5-0 は機能なし） ✅ ENV デフォルト OFF（HAKMEM_SMALL_HEAP_V5_ENABLED=0） ✅ 挙動完全不変（Mixed/C6 benchmark 確認済み） ✅ v4 との区別を明確化 (*_v5 suffix) ✅ v5-1 (stub) → v5-2 (本実装) → v5-3 (Mixed) への段階実装準備完了フェーズ: - v5-0: 型定義のみ（現在） - v5-1: C6-only stub route 追加 - v5-2: Segment/HotBox 本実装 (C6-only bench A/B) - v5-3: Mixed での段階昇格 (C6 → C5 → ...) 目標性能: Mixed 16–1024B で 50–60M ops/s (mimalloc の 5割) 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-11 03:09:57 +09:00
Moe Charm (CI)	bdfa32d869	Phase v4-mid-SEGV: C6 v4 を SmallSegment 専用に切り替え、TinyHeap SEGV を解決問題: C6 v4 が TinyHeap のページを共有することで iters >= 800k で freelist 破壊 → SEGV 発生修正内容: - c6_segment_alloc_page_direct(): C6 専用ページ割当 (SmallSegment v4 経由, TinyHeap 非共有) - c6_segment_release_page_direct(): C6 専用ページ返却 - cold_refill_page_v4() で C6 を分岐: SmallSegment 直接使用 - cold_retire_page_v4() で C6 を分岐: SmallSegment に直接返却 - fastlist state reset 処理追加 (L392-399) 結果: ✅ iters=1M, ws <= 390 で SEGV 消失 ✅ C6-only: v4 OFF ~47M → v4 ON ~43M ops/s (−8.5%, 安定) ✅ Mixed: v4 ON で SEGV なし (小幅回帰許容) 方針: C6 v4 は研究箱として安定化完了。本線には載せない (既存 mid/pool v1 使用)。 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-11 02:39:32 +09:00
Moe Charm (CI)	e486dd2c55	Phase v4-mid-6: Implement C6 v4 TLS Fastlist (Gated) - Implemented TLS fastlist logic for C6 in smallobject_hotbox_v4.c (alloc/free). - Added SmallC6FastState struct and g_small_c6_fast TLS variable. - Gated the fastlist logic with HAKMEM_SMALL_HEAP_V4_FASTLIST (default OFF) due to observed instability in mixed workloads. - Fixed a memory leak in small_heap_free_fast_v4 fallback path by calling hak_pool_free. - Updated CURRENT_TASK.md with phase report.	2025-12-11 01:44:08 +09:00
Moe Charm (CI)	dd974b49c5	Phase v4-mid-2, v4-mid-3, v4-mid-5: SmallObject HotBox v4 implementation and docs update Implementation: - SmallObject HotBox v4 (core/smallobject_hotbox_v4.c) now fully implements C6-only allocations and frees, including current/partial management and freelist operations. - Cold Iface (tiny_heap based) for page refill/retire is integrated. - Stats instrumentation (v4-mid-5) added to small_heap_alloc_fast_v4 and small_heap_free_fast_v4, with a new header file core/box/smallobject_hotbox_v4_stats_box.h and atexit dump function. Updates: - CURRENT_TASK.md has been condensed and updated with summaries of Phase v4-mid-2 (C6-only v4), Phase v4-mid-3 (C5-only v4 pilot), and the stats implementation (v4-mid-5). - docs/analysis/SMALLOBJECT_V4_BOX_DESIGN.md updated with A/B results and conclusions for C6-only and C5-only v4 implementations. - The previous CURRENT_TASK.md content has been archived to CURRENT_TASK_ARCHIVE_20251210.md.	2025-12-11 01:01:15 +09:00
Moe Charm (CI)	3b4449d773	Phase v4-mid-1: C6-only v4 route + page_meta_of() Fail-Fast validation Implementation: - SMALL_SEGMENT_V4_* constants (SIZE=2MiB, PAGE_SIZE=64KiB, MAGIC=0xDEADBEEF) - smallsegment_v4_page_meta_of(): O(1) mask+shift lookup with magic validation - Computes segment base: addr & ~(2MiB - 1) - Verifies SmallSegment magic number - Calculates page_idx: (addr - seg_base) >> PAGE_SHIFT (16) - Returns non-NULL sentinel for now (full page_meta[] in Phase v4-mid-2) Stubs for C6-only phase: - small_heap_alloc_fast_v4(): C6 returns NULL → pool v1 fallback - small_heap_free_fast_v4(): C6 calls page_meta_of() for Fail-Fast, then pool v1 fallback Documentation: - ENV_PROFILE_PRESETS.md: Add "C6_ONLY_SMALLOBJECT_V4" research profile - HAKMEM_SMALL_HEAP_V4_ENABLED=1, HAKMEM_SMALL_HEAP_V4_CLASSES=0x40 - Expected: Throughput ≈ 28–29M ops/s (same as v1) Build: - ビルド成功（警告のみ） - Backward compatible, alloc/free stubs fall back to pool v1 Sanity: - C6-heavy with v4 opt-in: segv/assert なし - page_meta_of() lookup working correctly - Performance unchanged (expected for stub phase) Status: - C6-only v4 route now available via ENV opt-in - Phase v4-mid-2: SmallHeapCtx v4 full implementation with A/B 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-10 23:37:45 +09:00
Moe Charm (CI)	e3e4cab833	Cleanup: Unify type naming and Cold Iface architecture Refactoring: - Type naming: Rename small_page_v4 → SmallPageMeta, small_class_heap_v4 → SmallClassHeap, small_heap_ctx_v4 → SmallHeapCtx - Keep backward compatibility aliases for existing code - SmallSegment struct unified, clean forward declarations - Cold Iface: Remove vtable (SmallColdIfaceV4 struct) in favor of direct function calls - Simplify refill_page/retire_page to direct calls, not callbacks - smallobject_hotbox_v4.c: Update to call small_cold_v4_* functions directly Documentation: - Add docs/analysis/ENV_CLEANUP_CANDIDATES.md - Categorize ENVs: KEEP (production), RESEARCH (opt-in), DELETE (obsolete) - v2 code: Keep as research infrastructure (complete, safe, gated) - v4 code: Research scaffold for future mid-level allocator Build: - ビルド成功（警告のみ） - Backward compatible, all existing code still works 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-10 23:30:32 +09:00
Moe Charm (CI)	52c65da783	Phase v4-mid-0: Small-object v4 型・IF 足場（箱化モジュール化） - SmallHeapCtx/SmallPageMeta/SmallClassHeap typedef alias 追加 - SmallSegment struct (base/num_pages/owner_tid/magic) を smallsegment_v4_box.h に定義 - SmallColdIface_v4 direct function prototypes (refill/retire/remote_push/drain) - smallobject_hotbox_v4.c の internal/public API 分離（small_segment_v4_internal） - direct function stubs 実装（SmallColdIfaceV4 delegate 形式） - ENV OFF デフォルト（ENABLED=0/CLASSES=0）で既存挙動 100% 不変 - ビルド成功・sanity 確認（mixed/C6-heavy、segv/assert なし） - CURRENT_TASK.md に Phase v4-mid-0 記録 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-10 23:23:07 +09:00
Moe Charm (CI)	2a13478dc7	Optimize C6 heavy and C7 ultra performance analysis with refined design refinements - Update environment profile presets and visibility analysis - Enhance small object and tiny segment v4 box implementations - Refine C7 ultra and C6 heavy allocation strategies - Add comprehensive performance metrics and design documentation 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-10 22:57:26 +09:00
Moe Charm (CI)	9460785bd6	Enable C7 ULTRA segment path by default	2025-12-10 22:25:24 +09:00
Moe Charm (CI)	bbb55b018a	Add C7 ULTRA segment skeleton and TLS freelist	2025-12-10 22:19:32 +09:00
Moe Charm (CI)	f2ce7256cd	Add v4 C7/C6 fast classify and small-segment v4 scaffolding	2025-12-10 19:14:38 +09:00
Moe Charm (CI)	3261025995	Phase v4-4: pilot C6 v4 route with opt-in gate	2025-12-10 18:18:05 +09:00
Moe Charm (CI)	7be30c0b5a	Avoid full-list scans for C7 v4 and tighten partial reuse	2025-12-10 18:04:32 +09:00
Moe Charm (CI)	860d934d71	Tune C7 v4 partial reuse for mixed perf	2025-12-10 18:03:28 +09:00
Moe Charm (CI)	cbd33511eb	Phase v4-3.1: reuse C7 v4 pages and record prep calls	2025-12-10 17:58:42 +09:00
Moe Charm (CI)	406a2f4d26	Incremental improvements: mid_desc cache, pool hotpath optimization, and doc updates Changes: - core/box/pool_api.inc.h: Code organization and micro-optimizations - CURRENT_TASK.md: Updated Phase MD1 (mid_desc TLS cache: +3.2% for C6-heavy) - docs/analysis files: Various analysis and documentation updates - AGENTS.md: Agent role clarifications - TINY_FRONT_V3_FLATTENING_GUIDE.md: Flattening strategy documentation Verification: - random_mixed_hakmem: 44.8M ops/s (1M iterations, 400 working set) - No segfaults or assertions across all benchmark variants - Stable performance across multiple runs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-10 14:00:57 +09:00
Moe Charm (CI)	0e5a2634bc	Phase 82 Final: Documentation of mid_desc race fix and comprehensive A/B results Implementation Summary: - Early `mid_desc_init_once()` in `hak_pool_init_impl()` prevents uninitialized mutex crash - Eliminates race condition that caused C7_SAFE + flatten crashes - Enables safe operation across all profiles (C7_SAFE, LEGACY) Benchmark Results (C6_HEAVY_LEGACY_POOLV1, Release): - Phase 1 (Baseline): 3.03M / 14.86M / 26.67M ops/s (10K/100K/1M) - Phase 2 (Zero Mode): +5.0% / -2.7% / -0.2% - Phase 3 (Flatten): +3.7% / +6.1% / -5.0% - Phase 4 (Combined): -5.1% / +8.8% / +2.0% (best at 100K: +8.8%) - Phase 5 (C7_SAFE Safety): NO CRASH ✅ (all iterations stable) Mainline Policy: - mid_desc initialization: Always enabled (crash prevention) - Flatten: Default OFF (bench opt-in via HAKMEM_POOL_V1_FLATTEN_ENABLED=1) - Zero Mode: Default FULL (bench opt-in via HAKMEM_POOL_ZERO_MODE=header) - Workload-specific: Medium (100K) benefits most (+8.8%) Documentation Updated: - CURRENT_TASK.md: Added Phase 82 conclusions with benchmark table - MID_LARGE_CPU_HOTPATH_ANALYSIS.md: Added Phase 82 Final with workload analysis 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-10 09:35:18 +09:00
Moe Charm (CI)	ae056e26ae	Phase ML1 refactoring: Code readability and warnings cleanup - Add (void) casts for unused timespec/profiling variables - Split multi-statement lines in pool_free_fast functions for clarity - Mark pool_hotbox_v2_pop_partial as __attribute__((unused)) - Verified functionality with HAKMEM_POOL_ZERO_MODE=header optimization - Performance stable: +16.1% improvement in header mode (10K iterations) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-10 09:15:24 +09:00
Moe Charm (CI)	acc64f2438	Phase ML1: Pool v1 memset 89.73% overhead 軽量化 (+15.34% improvement) ## Summary - ChatGPT により bench_profile.h の setenv segfault を修正（RTLD_NEXT 経由に切り替え） - core/box/pool_zero_mode_box.h 新設：ENV キャッシュ経由で ZERO_MODE を統一管理 - core/hakmem_pool.c で zero mode に応じた memset 制御（FULL/header/off） - A/B テスト結果：ZERO_MODE=header で +15.34% improvement（1M iterations, C6-heavy） ## Files Modified - core/box/pool_api.inc.h: pool_zero_mode_box.h include - core/bench_profile.h: glibc setenv → malloc+putenv（segfault 回避） - core/hakmem_pool.c: zero mode 参照・制御ロジック - core/box/pool_zero_mode_box.h (新設): enum/getter - CURRENT_TASK.md: Phase ML1 結果記載 ## Test Results \| Iterations \| ZERO_MODE=full \| ZERO_MODE=header \| Improvement \| \|-----------\|----------------\|-----------------\|------------\| \| 10K \| 3.06 M ops/s \| 3.17 M ops/s \| +3.65% \| \| 1M \| 23.71 M ops/s \| 27.34 M ops/s \| +15.34% \| 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-10 09:08:18 +09:00
Moe Charm (CI)	a905e0ffdd	Guard madvise ENOMEM and stabilize pool/tiny front v3	2025-12-09 21:50:15 +09:00
Moe Charm (CI)	e274d5f6a9	pool v1 flatten: break down free fallback causes and normalize mid_desc keys	2025-12-09 19:34:54 +09:00
Moe Charm (CI)	8f18963ad5	Phase 36-37: TinyHotHeap v2 HotBox redesign and C7 current_page policy fixes - Redefine TinyHotHeap v2 as per-thread Hot Box with clear boundaries - Add comprehensive OS statistics tracking for SS allocations - Implement route-based free handling for TinyHeap v2 - Add C6/C7 debugging and statistics improvements - Update documentation with implementation guidelines and analysis - Add new box headers for stats, routing, and front-end management	2025-12-08 21:30:21 +09:00
Moe Charm (CI)	34a8fd69b6	C7 v2: add lease helpers and v2 page reset	2025-12-08 14:40:03 +09:00
Moe Charm (CI)	9502501842	Fix tiny lane success handling for TinyHeap routes	2025-12-07 23:06:50 +09:00
Moe Charm (CI)	a6991ec9e4	Add TinyHeap class mask and extend routing	2025-12-07 22:49:28 +09:00
Moe Charm (CI)	9c68073557	C7 meta-light delta flush threshold and clamp	2025-12-07 22:42:02 +09:00
Moe Charm (CI)	fda6cd2e67	Boxify superslab registry, add bench profile, and document C7 hotpath experiments	2025-12-07 03:12:27 +09:00
Moe Charm (CI)	18faa6a1c4	Add OBSERVE stats and auto tiny policy profile	2025-12-06 01:44:05 +09:00
Moe Charm (CI)	03538055ae	Restore C7 Warm/TLS carve for release and add policy scaffolding	2025-12-06 01:34:04 +09:00
Moe Charm (CI)	d17ec46628	Fix C7 warm/TLS Release path and unify debug instrumentation	2025-12-05 23:41:01 +09:00
Moe Charm (CI)	e96e9a4bf9	Feat: Add TLS carve experiment for warm C7	2025-12-05 20:50:24 +09:00
Moe Charm (CI)	3e1d7c3798	Fix debug build after clean reset	2025-12-05 20:43:14 +09:00
Moe Charm (CI)	4c986fa9d1	Feat: Add experimental TLS Bind Box path in Unified Cache - Added experimental path in unified_cache_refill to test ss_tls_bind_one for C7 class. - Guarded by HAKMEM_WARM_TLS_BIND_C7 env var and debug build. - Updated Page Box comments to clarify future TLS Bind Box integration.	2025-12-05 20:05:11 +09:00
Moe Charm (CI)	45b2ccbe45	Refactor: Extract TLS Bind Box for unified slab binding - Created core/box/ss_tls_bind_box.h containing ss_tls_bind_one(). - Refactored superslab_refill() to use the new box. - Updated signatures to avoid circular dependencies (tiny_self_u32). - Added future integration points for Warm Pool and Page Box.	2025-12-05 19:57:30 +09:00
Moe Charm (CI)	093f362231	Add Page Box layer for C7 class optimization - Implement tiny_page_box.c/h: per-thread page cache between UC and Shared Pool - Integrate Page Box into Unified Cache refill path - Remove legacy SuperSlab implementation (merged into smallmid) - Add HAKMEM_TINY_PAGE_BOX_CLASSES env var for selective class enabling - Update bench_random_mixed.c with Page Box statistics Current status: Implementation safe, no regressions. Page Box ON/OFF shows minimal difference - pool strategy needs tuning. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-05 15:31:44 +09:00
Moe Charm (CI)	141b121e9c	Phase 1: Warm Pool Capacity Increase (16 → 12 with matching threshold) Key Changes: - Reduced static capacity from 16 to 12 SuperSlabs per class - Fixed prefill threshold from hardcoded 4 to match capacity (12) - Updated environment variable clamping to [1,12] - This allows warm pool to actually utilize its full capacity Performance: - Baseline (post-unified-cache-opt): 4.76M ops/s - After Phase 1: 4.84M ops/s - Improvement: +1.6% (expected +15-20%) Note: Actual improvement lower than expected because the warm pool bottleneck is only part of the overall allocation path. Unified cache optimization (+14.9%) already addressed much of the registry scan overhead. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-05 12:16:39 +09:00
Moe Charm (CI)	a04e3ba0e9	Optimize Unified Cache: Batch Freelist Validation + TLS Alignment Two complementary optimizations to improve unified cache hot path performance: 1. Batch Freelist Validation (core/front/tiny_unified_cache.c) - Remove duplicate per-block freelist validation in release builds - Consolidated validation logic into unified_refill_validate_base() function - Previously: hak_super_lookup(p) called on EVERY freelist block (~128 blocks) - Now: Single validation function at batch start - Impact (RELEASE): Eliminates 50-100 cycles per block × 128 = 1,280-2,560 cycles/refill - Impact (DEBUG): Full validation still available via unified_refill_validate_base() - Safety: Block integrity protected by header magic (0xA0 \| class_idx) 2. TLS Unified Cache Alignment (core/front/tiny_unified_cache.h) - Add __attribute__((aligned(64))) to TinyUnifiedCache struct - Aligns each per-class cache to 64-byte cache line boundary - Eliminates false sharing across classes (8 classes × 64B = 512B per thread) - Prevents cache line thrashing on concurrent class access - Fields stay same size (16B data + 48B padding), no binary compatibility issues - Requires clean rebuild due to struct size change (16B → 64B) Performance Expectations (projected, pending clean build measurement): - random_mixed (256B working set): +15-20% throughput gain - tiny_hot: No regression (already cache-friendly) - tiny_malloc: +3-5% throughput gain Benchmark Results (after clean rebuild): - Target: 4.3M → 5.0M ops/s (+17%) - tiny_hot: Maintain 150M+ ops/s (no regression) Code Quality: - ✅ Proper separation of concerns (validation logic centralized) - ✅ Clean compile-time gating with #if HAKMEM_BUILD_RELEASE - ✅ Memory-safe (all access patterns unchanged) - ✅ Maintainable (single source of truth for validation) Testing Required: - [ ] Clean rebuild (make clean && make bench_random_mixed_hakmem) - [ ] Performance measurement with consistent parameters - [ ] Debug build validation test (ensure corruption detection still works) - [ ] Multi-threaded correctness (TLS alignment safe for MT) 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: ChatGPT (optimization implementation)	2025-12-05 11:32:07 +09:00
Moe Charm (CI)	cd3280eee7	Implement MADV_POPULATE_WRITE fix for SuperSlab allocation Add support for MADV_POPULATE_WRITE (Linux 5.14+) to force page population AFTER munmap trimming in SuperSlab fallback path. Changes: 1. core/box/ss_os_acquire_box.c (lines 171-201): - Apply MADV_POPULATE_WRITE after munmap prefix/suffix trim - Fallback to explicit page touch for kernels < 5.14 - Always cleanup suffix region (remove MADV_DONTNEED path) 2. core/superslab_cache.c (lines 111-121): - Use MADV_POPULATE_WRITE instead of memset for efficiency - Fallback to memset if madvise fails Testing Results: - Page faults: Unchanged (~145K per 1M ops) - Throughput: -2% (4.18M → 4.10M ops/s with HAKMEM_SS_PREFAULT=1) - Root cause: 97.6% of page faults are from libc memset in initialization, not from SuperSlab memory access Conclusion: MADV_POPULATE_WRITE is effective for SuperSlab memory, but overall page fault bottleneck comes from TLS/shared pool initialization. Startup warmup remains the most effective solution (already implemented in bench_random_mixed.c with +9.5% improvement). 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-05 10:42:47 +09:00

1 2 3 4 5 ...

480 Commits