Commit Graph

9 Commits

Author SHA1 Message Date
8789542a9f Phase v5-7: C6 ULTRA pattern (research mode, 32-slot TLS freelist)
Implementation:
- ENV: HAKMEM_SMALL_HEAP_V5_ULTRA_C6_ENABLED=0|1 (default: 0)
- SmallHeapCtxV5: added c6_tls_freelist[32], c6_tls_count, ultra_c6_enabled
- small_segment_v5_owns_ptr_fast(): lightweight segment check for free path
- small_alloc_slow_v5_c6_refill(): batch TLS fill from page freelist
- small_free_slow_v5_c6_drain(): drain half of TLS to page on overflow

Performance (C6-heavy 257-768B, 2M iters, ws=400):
- v5 OFF baseline: 47M ops/s
- v5 ULTRA: 37-38M ops/s (-20%)
- vs v5 base (no opts): +3-5% improvement

Design limitation identified:
- Header write required on every alloc (freelist overwrites header byte)
- Segment validation required on every free
- page->used tracking required for retirement
- These prevent matching baseline pool v1 performance

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 13:32:46 +09:00
f191774c1e Phase v5-6: TLS batching for C6 v5
- Add HAKMEM_SMALL_HEAP_V5_BATCH_ENABLED ENV gate (default: 0)
- Add SmallV5Batch struct with 4-slot buffer in SmallHeapCtxV5
- Integrate batch alloc/free paths (after cache, before freelist)
- Fix pre-existing build error in tiny_free_magazine.inc.h (ss_time/tss undeclared)

Benchmarks (C6 257-768B):
- Batch OFF: 36.71M ops/s → Batch ON: 37.78M ops/s (+2.9%)
- Mixed 16-1024B: batch ON 37.09M vs OFF 38.25M (-3%, within noise)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 12:53:03 +09:00
2f5d53fd6d Phase v5-5: TLS cache for C6 v5
Add 1-slot TLS cache to C6 v5 to reduce page_meta access overhead.

Implementation:
- Add HAKMEM_SMALL_HEAP_V5_TLS_CACHE_ENABLED ENV (default: 0)
- SmallHeapCtxV5: add c6_cached_block field for TLS cache
- alloc: cache hit bypasses page_meta lookup, returns immediately
- free: empty cache stores block, full cache evicts old block first

Results (1M iter, ws=400, HEADER_MODE=full):
- C6-heavy (257-768B): 35.53M → 37.02M ops/s (+4.2%)
- Mixed 16-1024B: 38.04M → 37.93M ops/s (-0.3%, noise)

Known issue: header_mode=light has infinite loop bug
(freelist pointer/header collision). Full mode only for now.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 07:40:22 +09:00
2a548875b8 Phase v5-4: Header light mode & freelist optimization
Implements header write optimization for C6 v5 allocator by moving
header initialization from per-alloc time to carve time (during page
refill). This eliminates redundant header writes on the hot path.

Implementation:
- Added HAKMEM_SMALL_HEAP_V5_HEADER_MODE ENV (full|light, default: full)
- Added header_mode field to SmallHeapCtxV5 (cached per-thread)
- Modified alloc fast/slow paths to skip header write in light mode
- Modified refill to write headers during carve in light mode
- Free path unchanged (header validation still works)

Benchmark Results (2M iterations, ws=400):

C6-HEAVY (257-768B):
- Baseline (v5 OFF): 47.95 Mops/s
- v5 full mode:       38.97 Mops/s (-18.7% vs baseline)
- v5 light mode:      39.25 Mops/s (-18.1% vs baseline, +0.7% vs full)

MIXED 16-1024B:
- v5 OFF:       43.59 Mops/s
- v5 full mode: 36.53 Mops/s (-16.2% vs OFF)
- v5 light mode: 38.04 Mops/s (-12.7% vs OFF, +4.1% vs full)

Analysis:
- Light mode shows modest improvement over full (+0.7-4.1%)
- C6 v5 performance gap vs baseline (-18%) indicates need for
  further optimization beyond header writes
- Mixed workload benefits more from light mode (+4.1% vs full)
- No regressions in safety/correctness observed

Research findings:
- Header write optimization alone insufficient to close v5 gap
- Need to investigate other hot path costs (freelist ops, metadata access)
- Light mode validates the carve-time header concept

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 05:12:39 +09:00
7b5ee8cee2 Phase v5-3: O(1) path optimization for C6-only v5
- Single TLS segment (eliminates slot search loop)
- O(1) page_meta_of() (direct segment range check, no iteration)
- __builtin_ctz for O(1) free page finding in bitmap
- Simplified free path using page_meta_of() only (no find_page)
- Partial limit 1 (minimal list traversal)

Performance:
- Before (v5-2): 14.7M ops/s
- After (v5-3): 38.5M ops/s (+162%)
- vs baseline: 44.9M ops/s (-14%)
- SEGV: None, stable at ws=800

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 04:33:16 +09:00
4c2869397f Phase v5-3: SmallObject v5 定数・マクロ箱化リファクタリング
改善内容:
- 定数を box.h に統一 (C6_CLASS_IDX, BLOCK_SIZE, PARTIAL_LIMIT)
- List helper を マクロ化 (SMALL_PAGE_V5_PUSH_PARTIAL等)
- 重複関数 (page_push_partial等) を削除
- page_loc_t enum を box.h に移動

効果:
- hotbox_v5.c: 339行 → 263行 (76行削減)
- コード重複排除 (マクロで管理)
- 将来の拡張性向上
- 型安全性維持 (GCC statement expressions使用)

テスト:
- ビルド成功
- v5 OFF/ON 両方で動作確認
- 性能変化なし (リファクタリングのみ)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 04:24:20 +09:00
e0fb7d550a Phase v5-2: SmallObject v5 C6-only 本実装 (WIP - header fix)
本実装修正:
- tiny_region_id_write_header() を追加: USER pointer を正しく返す
- TLS slot からの segment 探索 (page_meta_of)
- Page-level allocation で segment 再利用
- 2MiB alignment 保証 (4MiB 確保 + alignment)
- free パスの route 修正 (v4 から v5 への fallthrough 削除)

動作確認:
- SEGV 消失: alloc/free 基本動作 OK
- 性能: ~18-20M ops/s (baseline 43-47M の約 40-45%)
- 回帰原因: TLS slot 線形探索 O(n)、find_page O(n)

残タスク:
- O(1) segment lookup 最適化 (hash または array 直接参照)
- find_page 除去 (segment lookup 成功時)
- partial_count/list 管理の最適化

ENV デフォルト OFF なので本線影響なし。

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 04:14:51 +09:00
9c24bebf08 Phase v5-1: SmallObject v5 C6-only route stub 接続
- tiny_route_env_box.h: TINY_ROUTE_SMALL_HEAP_V5 enum 追加、route snapshot で C6→v5 分岐
- malloc_tiny_fast.h: alloc/free switch に v5 case 追加(v1/pool fallback)
- smallobject_hotbox_v5.c: stub 実装(alloc は NULL 返却、free は no-op)
- smallobject_hotbox_v5_box.h: 関数 signature に ctx パラメータ追加
- Makefile: core/smallobject_hotbox_v5.o をリンクリストに追加
- ENV_PROFILE_PRESETS.md: v5-1 プリセット追記
- CURRENT_TASK.md: Phase v5-1 完了記録

**特性**:
- ENV: HAKMEM_SMALL_HEAP_V5_ENABLED=1 / HAKMEM_SMALL_HEAP_V5_CLASSES=0x40 で opt-in
- テスト結果: C6-heavy (v5 OFF 15.5M → v5 ON 16.4M ops/s, 正常), Mixed 47.2M ops/s, SEGV/assert なし
- 挙動は v1/pool fallback と同じ(実装は v5-2)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 03:25:37 +09:00
83d4096fbc Phase v5-0: SmallObject v5 の設計・型/IF/ENV スケルトン追加
設計ドキュメント:
- docs/analysis/SMALLOBJECT_V5_DESIGN.md: v5 アーキテクチャ全体設計

新規ファイル (v5 スケルトン):
- core/box/smallobject_hotbox_v5_box.h: HotBox v5 型定義
- core/box/smallsegment_v5_box.h: Segment v5 型定義
- core/box/smallobject_cold_iface_v5.h: ColdIface v5 IF宣言
- core/box/smallobject_v5_env_box.h: ENV ゲート
- core/smallobject_hotbox_v5.c: 実装 stub (完全 fallback)

特徴:
 型とインターフェースのみ定義(v5-0 は機能なし)
 ENV デフォルト OFF(HAKMEM_SMALL_HEAP_V5_ENABLED=0)
 挙動完全不変(Mixed/C6 benchmark 確認済み)
 v4 との区別を明確化 (*_v5 suffix)
 v5-1 (stub) → v5-2 (本実装) → v5-3 (Mixed) への段階実装準備完了

フェーズ:
- v5-0: 型定義のみ(現在)
- v5-1: C6-only stub route 追加
- v5-2: Segment/HotBox 本実装 (C6-only bench A/B)
- v5-3: Mixed での段階昇格 (C6 → C5 → ...)

目標性能: Mixed 16–1024B で 50–60M ops/s (mimalloc の 5割)

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 03:09:57 +09:00