Commit Graph

3 Commits

Author SHA1 Message Date
8789542a9f Phase v5-7: C6 ULTRA pattern (research mode, 32-slot TLS freelist)
Implementation:
- ENV: HAKMEM_SMALL_HEAP_V5_ULTRA_C6_ENABLED=0|1 (default: 0)
- SmallHeapCtxV5: added c6_tls_freelist[32], c6_tls_count, ultra_c6_enabled
- small_segment_v5_owns_ptr_fast(): lightweight segment check for free path
- small_alloc_slow_v5_c6_refill(): batch TLS fill from page freelist
- small_free_slow_v5_c6_drain(): drain half of TLS to page on overflow

Performance (C6-heavy 257-768B, 2M iters, ws=400):
- v5 OFF baseline: 47M ops/s
- v5 ULTRA: 37-38M ops/s (-20%)
- vs v5 base (no opts): +3-5% improvement

Design limitation identified:
- Header write required on every alloc (freelist overwrites header byte)
- Segment validation required on every free
- page->used tracking required for retirement
- These prevent matching baseline pool v1 performance

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 13:32:46 +09:00
7b5ee8cee2 Phase v5-3: O(1) path optimization for C6-only v5
- Single TLS segment (eliminates slot search loop)
- O(1) page_meta_of() (direct segment range check, no iteration)
- __builtin_ctz for O(1) free page finding in bitmap
- Simplified free path using page_meta_of() only (no find_page)
- Partial limit 1 (minimal list traversal)

Performance:
- Before (v5-2): 14.7M ops/s
- After (v5-3): 38.5M ops/s (+162%)
- vs baseline: 44.9M ops/s (-14%)
- SEGV: None, stable at ws=800

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 04:33:16 +09:00
e0fb7d550a Phase v5-2: SmallObject v5 C6-only 本実装 (WIP - header fix)
本実装修正:
- tiny_region_id_write_header() を追加: USER pointer を正しく返す
- TLS slot からの segment 探索 (page_meta_of)
- Page-level allocation で segment 再利用
- 2MiB alignment 保証 (4MiB 確保 + alignment)
- free パスの route 修正 (v4 から v5 への fallthrough 削除)

動作確認:
- SEGV 消失: alloc/free 基本動作 OK
- 性能: ~18-20M ops/s (baseline 43-47M の約 40-45%)
- 回帰原因: TLS slot 線形探索 O(n)、find_page O(n)

残タスク:
- O(1) segment lookup 最適化 (hash または array 直接参照)
- find_page 除去 (segment lookup 成功時)
- partial_count/list 管理の最適化

ENV デフォルト OFF なので本線影響なし。

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-11 04:14:51 +09:00