hakmem

Author	SHA1	Message	Date
Moe Charm (CI)	2f5d53fd6d	Phase v5-5: TLS cache for C6 v5 Add 1-slot TLS cache to C6 v5 to reduce page_meta access overhead. Implementation: - Add HAKMEM_SMALL_HEAP_V5_TLS_CACHE_ENABLED ENV (default: 0) - SmallHeapCtxV5: add c6_cached_block field for TLS cache - alloc: cache hit bypasses page_meta lookup, returns immediately - free: empty cache stores block, full cache evicts old block first Results (1M iter, ws=400, HEADER_MODE=full): - C6-heavy (257-768B): 35.53M → 37.02M ops/s (+4.2%) - Mixed 16-1024B: 38.04M → 37.93M ops/s (-0.3%, noise) Known issue: header_mode=light has infinite loop bug (freelist pointer/header collision). Full mode only for now. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-11 07:40:22 +09:00
Moe Charm (CI)	2a548875b8	Phase v5-4: Header light mode & freelist optimization Implements header write optimization for C6 v5 allocator by moving header initialization from per-alloc time to carve time (during page refill). This eliminates redundant header writes on the hot path. Implementation: - Added HAKMEM_SMALL_HEAP_V5_HEADER_MODE ENV (full\|light, default: full) - Added header_mode field to SmallHeapCtxV5 (cached per-thread) - Modified alloc fast/slow paths to skip header write in light mode - Modified refill to write headers during carve in light mode - Free path unchanged (header validation still works) Benchmark Results (2M iterations, ws=400): C6-HEAVY (257-768B): - Baseline (v5 OFF): 47.95 Mops/s - v5 full mode: 38.97 Mops/s (-18.7% vs baseline) - v5 light mode: 39.25 Mops/s (-18.1% vs baseline, +0.7% vs full) MIXED 16-1024B: - v5 OFF: 43.59 Mops/s - v5 full mode: 36.53 Mops/s (-16.2% vs OFF) - v5 light mode: 38.04 Mops/s (-12.7% vs OFF, +4.1% vs full) Analysis: - Light mode shows modest improvement over full (+0.7-4.1%) - C6 v5 performance gap vs baseline (-18%) indicates need for further optimization beyond header writes - Mixed workload benefits more from light mode (+4.1% vs full) - No regressions in safety/correctness observed Research findings: - Header write optimization alone insufficient to close v5 gap - Need to investigate other hot path costs (freelist ops, metadata access) - Light mode validates the carve-time header concept 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-11 05:12:39 +09:00
Moe Charm (CI)	7b5ee8cee2	Phase v5-3: O(1) path optimization for C6-only v5 - Single TLS segment (eliminates slot search loop) - O(1) page_meta_of() (direct segment range check, no iteration) - __builtin_ctz for O(1) free page finding in bitmap - Simplified free path using page_meta_of() only (no find_page) - Partial limit 1 (minimal list traversal) Performance: - Before (v5-2): 14.7M ops/s - After (v5-3): 38.5M ops/s (+162%) - vs baseline: 44.9M ops/s (-14%) - SEGV: None, stable at ws=800 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-11 04:33:16 +09:00
Moe Charm (CI)	4c2869397f	Phase v5-3: SmallObject v5 定数・マクロ箱化リファクタリング改善内容: - 定数を box.h に統一 (C6_CLASS_IDX, BLOCK_SIZE, PARTIAL_LIMIT) - List helper をマクロ化 (SMALL_PAGE_V5_PUSH_PARTIAL等) - 重複関数 (page_push_partial等) を削除 - page_loc_t enum を box.h に移動効果: - hotbox_v5.c: 339行 → 263行 (76行削減) - コード重複排除 (マクロで管理) - 将来の拡張性向上 - 型安全性維持 (GCC statement expressions使用) テスト: - ビルド成功 - v5 OFF/ON 両方で動作確認 - 性能変化なし (リファクタリングのみ) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-11 04:24:20 +09:00
Moe Charm (CI)	e0fb7d550a	Phase v5-2: SmallObject v5 C6-only 本実装 (WIP - header fix) 本実装修正: - tiny_region_id_write_header() を追加: USER pointer を正しく返す - TLS slot からの segment 探索 (page_meta_of) - Page-level allocation で segment 再利用 - 2MiB alignment 保証 (4MiB 確保 + alignment) - free パスの route 修正 (v4 から v5 への fallthrough 削除) 動作確認: - SEGV 消失: alloc/free 基本動作 OK - 性能: ~18-20M ops/s (baseline 43-47M の約 40-45%) - 回帰原因: TLS slot 線形探索 O(n)、find_page O(n) 残タスク: - O(1) segment lookup 最適化 (hash または array 直接参照) - find_page 除去 (segment lookup 成功時) - partial_count/list 管理の最適化 ENV デフォルト OFF なので本線影響なし。 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-11 04:14:51 +09:00
Moe Charm (CI)	9c24bebf08	Phase v5-1: SmallObject v5 C6-only route stub 接続 - tiny_route_env_box.h: TINY_ROUTE_SMALL_HEAP_V5 enum 追加、route snapshot で C6→v5 分岐 - malloc_tiny_fast.h: alloc/free switch に v5 case 追加（v1/pool fallback） - smallobject_hotbox_v5.c: stub 実装（alloc は NULL 返却、free は no-op） - smallobject_hotbox_v5_box.h: 関数 signature に ctx パラメータ追加 - Makefile: core/smallobject_hotbox_v5.o をリンクリストに追加 - ENV_PROFILE_PRESETS.md: v5-1 プリセット追記 - CURRENT_TASK.md: Phase v5-1 完了記録特性: - ENV: HAKMEM_SMALL_HEAP_V5_ENABLED=1 / HAKMEM_SMALL_HEAP_V5_CLASSES=0x40 で opt-in - テスト結果: C6-heavy (v5 OFF 15.5M → v5 ON 16.4M ops/s, 正常), Mixed 47.2M ops/s, SEGV/assert なし - 挙動は v1/pool fallback と同じ（実装は v5-2） 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-11 03:25:37 +09:00
Moe Charm (CI)	dedfea27d5	Phase v5-0 refactor: ENV統一・マクロ化・構造体最適化 - ENV initialization を sentinel パターンで統一 - ENV_UNINIT/ENABLED/DISABLED 定数追加 - __builtin_expect で初期化チェックを最適化 - small_heap_v5_enabled/class_mask を統一パターンに変更 - ポインタマクロ化（O(1) segment/page 計算） - SMALL_SEGMENT_V5_BASE_FROM_PTR: ptr から segment base を mask で計算 - SMALL_SEGMENT_V5_PAGE_IDX: segment 内の page_idx を shift で計算 - SMALL_SEGMENT_V5_PAGE_META: page_meta への O(1) access（bounds check付き） - SMALL_SEGMENT_V5_VALIDATE_MAGIC: magic 検証 - SMALL_SEGMENT_V5_VALIDATE_PTR: Fail-Fast validation pipeline - SmallClassHeapV5 に partial_count 追加 - partial ページリストのカウンタを追加（refill/retire 最適化用） - SmallPageMetaV5 の field 再配置（L1 cache 最適化） - hot fields (free_list, used, capacity) を先頭に集約 - metadata (class_idx, flags, page_idx, segment) を後方配置 - total 24B、offset コメント追加 - route priority ENV 追加 - HAKMEM_ROUTE_PRIORITY={v4\|v5\|auto}（default: v4） - enum small_route_priority 定義 - small_route_priority() 関数追加 - segment_size override ENV 追加 - HAKMEM_SMALL_HEAP_V5_SEGMENT_SIZE（default: 2MiB） - power of 2 & >= 64KiB validation 挙動: 完全不変（v5 route は呼ばれない、ENV default OFF）テスト: Mixed 16–1024B で 43.0–43.8M ops/s（変化なし）、SEGV/assert なし 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-11 03:19:18 +09:00
Moe Charm (CI)	83d4096fbc	Phase v5-0: SmallObject v5 の設計・型/IF/ENV スケルトン追加設計ドキュメント: - docs/analysis/SMALLOBJECT_V5_DESIGN.md: v5 アーキテクチャ全体設計新規ファイル (v5 スケルトン): - core/box/smallobject_hotbox_v5_box.h: HotBox v5 型定義 - core/box/smallsegment_v5_box.h: Segment v5 型定義 - core/box/smallobject_cold_iface_v5.h: ColdIface v5 IF宣言 - core/box/smallobject_v5_env_box.h: ENV ゲート - core/smallobject_hotbox_v5.c: 実装 stub (完全 fallback) 特徴: ✅ 型とインターフェースのみ定義（v5-0 は機能なし） ✅ ENV デフォルト OFF（HAKMEM_SMALL_HEAP_V5_ENABLED=0） ✅ 挙動完全不変（Mixed/C6 benchmark 確認済み） ✅ v4 との区別を明確化 (*_v5 suffix) ✅ v5-1 (stub) → v5-2 (本実装) → v5-3 (Mixed) への段階実装準備完了フェーズ: - v5-0: 型定義のみ（現在） - v5-1: C6-only stub route 追加 - v5-2: Segment/HotBox 本実装 (C6-only bench A/B) - v5-3: Mixed での段階昇格 (C6 → C5 → ...) 目標性能: Mixed 16–1024B で 50–60M ops/s (mimalloc の 5割) 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-11 03:09:57 +09:00
Moe Charm (CI)	bdfa32d869	Phase v4-mid-SEGV: C6 v4 を SmallSegment 専用に切り替え、TinyHeap SEGV を解決問題: C6 v4 が TinyHeap のページを共有することで iters >= 800k で freelist 破壊 → SEGV 発生修正内容: - c6_segment_alloc_page_direct(): C6 専用ページ割当 (SmallSegment v4 経由, TinyHeap 非共有) - c6_segment_release_page_direct(): C6 専用ページ返却 - cold_refill_page_v4() で C6 を分岐: SmallSegment 直接使用 - cold_retire_page_v4() で C6 を分岐: SmallSegment に直接返却 - fastlist state reset 処理追加 (L392-399) 結果: ✅ iters=1M, ws <= 390 で SEGV 消失 ✅ C6-only: v4 OFF ~47M → v4 ON ~43M ops/s (−8.5%, 安定) ✅ Mixed: v4 ON で SEGV なし (小幅回帰許容) 方針: C6 v4 は研究箱として安定化完了。本線には載せない (既存 mid/pool v1 使用)。 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-11 02:39:32 +09:00
Moe Charm (CI)	e486dd2c55	Phase v4-mid-6: Implement C6 v4 TLS Fastlist (Gated) - Implemented TLS fastlist logic for C6 in smallobject_hotbox_v4.c (alloc/free). - Added SmallC6FastState struct and g_small_c6_fast TLS variable. - Gated the fastlist logic with HAKMEM_SMALL_HEAP_V4_FASTLIST (default OFF) due to observed instability in mixed workloads. - Fixed a memory leak in small_heap_free_fast_v4 fallback path by calling hak_pool_free. - Updated CURRENT_TASK.md with phase report.	2025-12-11 01:44:08 +09:00
Moe Charm (CI)	dd974b49c5	Phase v4-mid-2, v4-mid-3, v4-mid-5: SmallObject HotBox v4 implementation and docs update Implementation: - SmallObject HotBox v4 (core/smallobject_hotbox_v4.c) now fully implements C6-only allocations and frees, including current/partial management and freelist operations. - Cold Iface (tiny_heap based) for page refill/retire is integrated. - Stats instrumentation (v4-mid-5) added to small_heap_alloc_fast_v4 and small_heap_free_fast_v4, with a new header file core/box/smallobject_hotbox_v4_stats_box.h and atexit dump function. Updates: - CURRENT_TASK.md has been condensed and updated with summaries of Phase v4-mid-2 (C6-only v4), Phase v4-mid-3 (C5-only v4 pilot), and the stats implementation (v4-mid-5). - docs/analysis/SMALLOBJECT_V4_BOX_DESIGN.md updated with A/B results and conclusions for C6-only and C5-only v4 implementations. - The previous CURRENT_TASK.md content has been archived to CURRENT_TASK_ARCHIVE_20251210.md.	2025-12-11 01:01:15 +09:00
Moe Charm (CI)	3b4449d773	Phase v4-mid-1: C6-only v4 route + page_meta_of() Fail-Fast validation Implementation: - SMALL_SEGMENT_V4_* constants (SIZE=2MiB, PAGE_SIZE=64KiB, MAGIC=0xDEADBEEF) - smallsegment_v4_page_meta_of(): O(1) mask+shift lookup with magic validation - Computes segment base: addr & ~(2MiB - 1) - Verifies SmallSegment magic number - Calculates page_idx: (addr - seg_base) >> PAGE_SHIFT (16) - Returns non-NULL sentinel for now (full page_meta[] in Phase v4-mid-2) Stubs for C6-only phase: - small_heap_alloc_fast_v4(): C6 returns NULL → pool v1 fallback - small_heap_free_fast_v4(): C6 calls page_meta_of() for Fail-Fast, then pool v1 fallback Documentation: - ENV_PROFILE_PRESETS.md: Add "C6_ONLY_SMALLOBJECT_V4" research profile - HAKMEM_SMALL_HEAP_V4_ENABLED=1, HAKMEM_SMALL_HEAP_V4_CLASSES=0x40 - Expected: Throughput ≈ 28–29M ops/s (same as v1) Build: - ビルド成功（警告のみ） - Backward compatible, alloc/free stubs fall back to pool v1 Sanity: - C6-heavy with v4 opt-in: segv/assert なし - page_meta_of() lookup working correctly - Performance unchanged (expected for stub phase) Status: - C6-only v4 route now available via ENV opt-in - Phase v4-mid-2: SmallHeapCtx v4 full implementation with A/B 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-10 23:37:45 +09:00
Moe Charm (CI)	e3e4cab833	Cleanup: Unify type naming and Cold Iface architecture Refactoring: - Type naming: Rename small_page_v4 → SmallPageMeta, small_class_heap_v4 → SmallClassHeap, small_heap_ctx_v4 → SmallHeapCtx - Keep backward compatibility aliases for existing code - SmallSegment struct unified, clean forward declarations - Cold Iface: Remove vtable (SmallColdIfaceV4 struct) in favor of direct function calls - Simplify refill_page/retire_page to direct calls, not callbacks - smallobject_hotbox_v4.c: Update to call small_cold_v4_* functions directly Documentation: - Add docs/analysis/ENV_CLEANUP_CANDIDATES.md - Categorize ENVs: KEEP (production), RESEARCH (opt-in), DELETE (obsolete) - v2 code: Keep as research infrastructure (complete, safe, gated) - v4 code: Research scaffold for future mid-level allocator Build: - ビルド成功（警告のみ） - Backward compatible, all existing code still works 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-10 23:30:32 +09:00
Moe Charm (CI)	52c65da783	Phase v4-mid-0: Small-object v4 型・IF 足場（箱化モジュール化） - SmallHeapCtx/SmallPageMeta/SmallClassHeap typedef alias 追加 - SmallSegment struct (base/num_pages/owner_tid/magic) を smallsegment_v4_box.h に定義 - SmallColdIface_v4 direct function prototypes (refill/retire/remote_push/drain) - smallobject_hotbox_v4.c の internal/public API 分離（small_segment_v4_internal） - direct function stubs 実装（SmallColdIfaceV4 delegate 形式） - ENV OFF デフォルト（ENABLED=0/CLASSES=0）で既存挙動 100% 不変 - ビルド成功・sanity 確認（mixed/C6-heavy、segv/assert なし） - CURRENT_TASK.md に Phase v4-mid-0 記録 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-10 23:23:07 +09:00
Moe Charm (CI)	2a13478dc7	Optimize C6 heavy and C7 ultra performance analysis with refined design refinements - Update environment profile presets and visibility analysis - Enhance small object and tiny segment v4 box implementations - Refine C7 ultra and C6 heavy allocation strategies - Add comprehensive performance metrics and design documentation 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-10 22:57:26 +09:00
Moe Charm (CI)	9460785bd6	Enable C7 ULTRA segment path by default	2025-12-10 22:25:24 +09:00
Moe Charm (CI)	bbb55b018a	Add C7 ULTRA segment skeleton and TLS freelist	2025-12-10 22:19:32 +09:00
Moe Charm (CI)	49a1fe8416	Add perf/benchmark measurement principles for hot path changes	2025-12-10 19:49:44 +09:00
Moe Charm (CI)	f2ce7256cd	Add v4 C7/C6 fast classify and small-segment v4 scaffolding	2025-12-10 19:14:38 +09:00
Moe Charm (CI)	3261025995	Phase v4-4: pilot C6 v4 route with opt-in gate	2025-12-10 18:18:05 +09:00
Moe Charm (CI)	7be30c0b5a	Avoid full-list scans for C7 v4 and tighten partial reuse	2025-12-10 18:04:32 +09:00
Moe Charm (CI)	860d934d71	Tune C7 v4 partial reuse for mixed perf	2025-12-10 18:03:28 +09:00
Moe Charm (CI)	cbd33511eb	Phase v4-3.1: reuse C7 v4 pages and record prep calls	2025-12-10 17:58:42 +09:00
Moe Charm (CI)	31dd1e19d7	Document that dev machine/env are kept constant across sessions	2025-12-10 15:19:10 +09:00
Moe Charm (CI)	677030d699	Document new Mixed baseline and C7 header dedup A/B	2025-12-10 14:38:49 +09:00
Moe Charm (CI)	d576116484	Document current Mixed baseline throughput and ENV profile	2025-12-10 14:12:13 +09:00
Moe Charm (CI)	406a2f4d26	Incremental improvements: mid_desc cache, pool hotpath optimization, and doc updates Changes: - core/box/pool_api.inc.h: Code organization and micro-optimizations - CURRENT_TASK.md: Updated Phase MD1 (mid_desc TLS cache: +3.2% for C6-heavy) - docs/analysis files: Various analysis and documentation updates - AGENTS.md: Agent role clarifications - TINY_FRONT_V3_FLATTENING_GUIDE.md: Flattening strategy documentation Verification: - random_mixed_hakmem: 44.8M ops/s (1M iterations, 400 working set) - No segfaults or assertions across all benchmark variants - Stable performance across multiple runs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-10 14:00:57 +09:00
Moe Charm (CI)	0e5a2634bc	Phase 82 Final: Documentation of mid_desc race fix and comprehensive A/B results Implementation Summary: - Early `mid_desc_init_once()` in `hak_pool_init_impl()` prevents uninitialized mutex crash - Eliminates race condition that caused C7_SAFE + flatten crashes - Enables safe operation across all profiles (C7_SAFE, LEGACY) Benchmark Results (C6_HEAVY_LEGACY_POOLV1, Release): - Phase 1 (Baseline): 3.03M / 14.86M / 26.67M ops/s (10K/100K/1M) - Phase 2 (Zero Mode): +5.0% / -2.7% / -0.2% - Phase 3 (Flatten): +3.7% / +6.1% / -5.0% - Phase 4 (Combined): -5.1% / +8.8% / +2.0% (best at 100K: +8.8%) - Phase 5 (C7_SAFE Safety): NO CRASH ✅ (all iterations stable) Mainline Policy: - mid_desc initialization: Always enabled (crash prevention) - Flatten: Default OFF (bench opt-in via HAKMEM_POOL_V1_FLATTEN_ENABLED=1) - Zero Mode: Default FULL (bench opt-in via HAKMEM_POOL_ZERO_MODE=header) - Workload-specific: Medium (100K) benefits most (+8.8%) Documentation Updated: - CURRENT_TASK.md: Added Phase 82 conclusions with benchmark table - MID_LARGE_CPU_HOTPATH_ANALYSIS.md: Added Phase 82 Final with workload analysis 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-10 09:35:18 +09:00
Moe Charm (CI)	ae056e26ae	Phase ML1 refactoring: Code readability and warnings cleanup - Add (void) casts for unused timespec/profiling variables - Split multi-statement lines in pool_free_fast functions for clarity - Mark pool_hotbox_v2_pop_partial as __attribute__((unused)) - Verified functionality with HAKMEM_POOL_ZERO_MODE=header optimization - Performance stable: +16.1% improvement in header mode (10K iterations) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-10 09:15:24 +09:00
Moe Charm (CI)	acc64f2438	Phase ML1: Pool v1 memset 89.73% overhead 軽量化 (+15.34% improvement) ## Summary - ChatGPT により bench_profile.h の setenv segfault を修正（RTLD_NEXT 経由に切り替え） - core/box/pool_zero_mode_box.h 新設：ENV キャッシュ経由で ZERO_MODE を統一管理 - core/hakmem_pool.c で zero mode に応じた memset 制御（FULL/header/off） - A/B テスト結果：ZERO_MODE=header で +15.34% improvement（1M iterations, C6-heavy） ## Files Modified - core/box/pool_api.inc.h: pool_zero_mode_box.h include - core/bench_profile.h: glibc setenv → malloc+putenv（segfault 回避） - core/hakmem_pool.c: zero mode 参照・制御ロジック - core/box/pool_zero_mode_box.h (新設): enum/getter - CURRENT_TASK.md: Phase ML1 結果記載 ## Test Results \| Iterations \| ZERO_MODE=full \| ZERO_MODE=header \| Improvement \| \|-----------\|----------------\|-----------------\|------------\| \| 10K \| 3.06 M ops/s \| 3.17 M ops/s \| +3.65% \| \| 1M \| 23.71 M ops/s \| 27.34 M ops/s \| +15.34% \| 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2025-12-10 09:08:18 +09:00
Moe Charm (CI)	a905e0ffdd	Guard madvise ENOMEM and stabilize pool/tiny front v3	2025-12-09 21:50:15 +09:00
Moe Charm (CI)	e274d5f6a9	pool v1 flatten: break down free fallback causes and normalize mid_desc keys	2025-12-09 19:34:54 +09:00
Moe Charm (CI)	8f18963ad5	Phase 36-37: TinyHotHeap v2 HotBox redesign and C7 current_page policy fixes - Redefine TinyHotHeap v2 as per-thread Hot Box with clear boundaries - Add comprehensive OS statistics tracking for SS allocations - Implement route-based free handling for TinyHeap v2 - Add C6/C7 debugging and statistics improvements - Update documentation with implementation guidelines and analysis - Add new box headers for stats, routing, and front-end management	2025-12-08 21:30:21 +09:00
Moe Charm (CI)	34a8fd69b6	C7 v2: add lease helpers and v2 page reset	2025-12-08 14:40:03 +09:00
Moe Charm (CI)	9502501842	Fix tiny lane success handling for TinyHeap routes	2025-12-07 23:06:50 +09:00
Moe Charm (CI)	a6991ec9e4	Add TinyHeap class mask and extend routing	2025-12-07 22:49:28 +09:00
Moe Charm (CI)	9c68073557	C7 meta-light delta flush threshold and clamp	2025-12-07 22:42:02 +09:00
Moe Charm (CI)	fda6cd2e67	Boxify superslab registry, add bench profile, and document C7 hotpath experiments	2025-12-07 03:12:27 +09:00
Moe Charm (CI)	18faa6a1c4	Add OBSERVE stats and auto tiny policy profile	2025-12-06 01:44:05 +09:00
Moe Charm (CI)	03538055ae	Restore C7 Warm/TLS carve for release and add policy scaffolding	2025-12-06 01:34:04 +09:00
Moe Charm (CI)	d17ec46628	Fix C7 warm/TLS Release path and unify debug instrumentation	2025-12-05 23:41:01 +09:00
Moe Charm (CI)	96c2988381	Bench: add C7-only mode for warm TLS tests	2025-12-05 20:56:20 +09:00
Moe Charm (CI)	e96e9a4bf9	Feat: Add TLS carve experiment for warm C7	2025-12-05 20:50:24 +09:00
Moe Charm (CI)	3e1d7c3798	Fix debug build after clean reset	2025-12-05 20:43:14 +09:00
Moe Charm (CI)	4c986fa9d1	Feat: Add experimental TLS Bind Box path in Unified Cache - Added experimental path in unified_cache_refill to test ss_tls_bind_one for C7 class. - Guarded by HAKMEM_WARM_TLS_BIND_C7 env var and debug build. - Updated Page Box comments to clarify future TLS Bind Box integration.	2025-12-05 20:05:11 +09:00
Moe Charm (CI)	45b2ccbe45	Refactor: Extract TLS Bind Box for unified slab binding - Created core/box/ss_tls_bind_box.h containing ss_tls_bind_one(). - Refactored superslab_refill() to use the new box. - Updated signatures to avoid circular dependencies (tiny_self_u32). - Added future integration points for Warm Pool and Page Box.	2025-12-05 19:57:30 +09:00
Moe Charm (CI)	a67965139f	Add performance analysis reports and archive legacy superslab - Add investigation reports for allocation routing, bottlenecks, madvise - Archive old smallmid superslab implementation - Document Page Box integration findings 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-05 15:31:58 +09:00
Moe Charm (CI)	093f362231	Add Page Box layer for C7 class optimization - Implement tiny_page_box.c/h: per-thread page cache between UC and Shared Pool - Integrate Page Box into Unified Cache refill path - Remove legacy SuperSlab implementation (merged into smallmid) - Add HAKMEM_TINY_PAGE_BOX_CLASSES env var for selective class enabling - Update bench_random_mixed.c with Page Box statistics Current status: Implementation safe, no regressions. Page Box ON/OFF shows minimal difference - pool strategy needs tuning. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-05 15:31:44 +09:00
Moe Charm (CI)	2b2b607957	Add workload comparison and madvise investigation reports Key findings from 2025-12-05 session: 1. HAKMEM vs mimalloc: 27x slower (4.5M vs 122M ops/s) 2. Root cause investigation: madvise 1081 calls vs mimalloc 0 calls 3. madvise disable test: -15% performance (worse, not better!) 4. Conclusion: MADV_POPULATE_WRITE is actually helping, not hurting 5. ChatGPT was right: time to move to user-space optimization phase Reports added: - WORKLOAD_COMPARISON_20251205.md - PARTIAL_RELEASE_INVESTIGATION_REPORT_20251205.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-05 13:31:45 +09:00
Moe Charm (CI)	802b1a1764	Add performance analysis reports for 2025-12-05 session Key findings: 1. Warm Pool optimization (+1.6%) - capacity fix deployed 2. PGO optimization (+0.6%) - limited effect due to existing optimizations 3. 16-1024B vs 8-128B performance gap identified: - 8-128B (Tiny only): 88M ops/s (5x faster than previous 16.46M baseline) - 16-1024B (mixed): 4.84M ops/s (needs investigation) 4. Root cause analysis: madvise() (Partial Release) consuming 58% CPU time Reports added: - WARM_POOL_OPTIMIZATION_ANALYSIS_20251205.md - PERF_ANALYSIS_16_1024B_20251205.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-05 13:04:36 +09:00

1 2 3 4 5 ...

470 Commits