Commit Graph

4 Commits

Author SHA1 Message Date
2f5d53fd6d Phase v5-5: TLS cache for C6 v5
Add 1-slot TLS cache to C6 v5 to reduce page_meta access overhead.

Implementation:
- Add HAKMEM_SMALL_HEAP_V5_TLS_CACHE_ENABLED ENV (default: 0)
- SmallHeapCtxV5: add c6_cached_block field for TLS cache
- alloc: cache hit bypasses page_meta lookup, returns immediately
- free: empty cache stores block, full cache evicts old block first

Results (1M iter, ws=400, HEADER_MODE=full):
- C6-heavy (257-768B): 35.53M → 37.02M ops/s (+4.2%)
- Mixed 16-1024B: 38.04M → 37.93M ops/s (-0.3%, noise)

Known issue: header_mode=light has infinite loop bug
(freelist pointer/header collision). Full mode only for now.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 07:40:22 +09:00
2a548875b8 Phase v5-4: Header light mode & freelist optimization
Implements header write optimization for C6 v5 allocator by moving
header initialization from per-alloc time to carve time (during page
refill). This eliminates redundant header writes on the hot path.

Implementation:
- Added HAKMEM_SMALL_HEAP_V5_HEADER_MODE ENV (full|light, default: full)
- Added header_mode field to SmallHeapCtxV5 (cached per-thread)
- Modified alloc fast/slow paths to skip header write in light mode
- Modified refill to write headers during carve in light mode
- Free path unchanged (header validation still works)

Benchmark Results (2M iterations, ws=400):

C6-HEAVY (257-768B):
- Baseline (v5 OFF): 47.95 Mops/s
- v5 full mode:       38.97 Mops/s (-18.7% vs baseline)
- v5 light mode:      39.25 Mops/s (-18.1% vs baseline, +0.7% vs full)

MIXED 16-1024B:
- v5 OFF:       43.59 Mops/s
- v5 full mode: 36.53 Mops/s (-16.2% vs OFF)
- v5 light mode: 38.04 Mops/s (-12.7% vs OFF, +4.1% vs full)

Analysis:
- Light mode shows modest improvement over full (+0.7-4.1%)
- C6 v5 performance gap vs baseline (-18%) indicates need for
  further optimization beyond header writes
- Mixed workload benefits more from light mode (+4.1% vs full)
- No regressions in safety/correctness observed

Research findings:
- Header write optimization alone insufficient to close v5 gap
- Need to investigate other hot path costs (freelist ops, metadata access)
- Light mode validates the carve-time header concept

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 05:12:39 +09:00
4c2869397f Phase v5-3: SmallObject v5 定数・マクロ箱化リファクタリング
改善内容:
- 定数を box.h に統一 (C6_CLASS_IDX, BLOCK_SIZE, PARTIAL_LIMIT)
- List helper を マクロ化 (SMALL_PAGE_V5_PUSH_PARTIAL等)
- 重複関数 (page_push_partial等) を削除
- page_loc_t enum を box.h に移動

効果:
- hotbox_v5.c: 339行 → 263行 (76行削減)
- コード重複排除 (マクロで管理)
- 将来の拡張性向上
- 型安全性維持 (GCC statement expressions使用)

テスト:
- ビルド成功
- v5 OFF/ON 両方で動作確認
- 性能変化なし (リファクタリングのみ)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 04:24:20 +09:00
e0fb7d550a Phase v5-2: SmallObject v5 C6-only 本実装 (WIP - header fix)
本実装修正:
- tiny_region_id_write_header() を追加: USER pointer を正しく返す
- TLS slot からの segment 探索 (page_meta_of)
- Page-level allocation で segment 再利用
- 2MiB alignment 保証 (4MiB 確保 + alignment)
- free パスの route 修正 (v4 から v5 への fallthrough 削除)

動作確認:
- SEGV 消失: alloc/free 基本動作 OK
- 性能: ~18-20M ops/s (baseline 43-47M の約 40-45%)
- 回帰原因: TLS slot 線形探索 O(n)、find_page O(n)

残タスク:
- O(1) segment lookup 最適化 (hash または array 直接参照)
- find_page 除去 (segment lookup 成功時)
- partial_count/list 管理の最適化

ENV デフォルト OFF なので本線影響なし。

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 04:14:51 +09:00