Commit Graph

5 Commits

Author SHA1 Message Date
6f559e1a1d v7-7: Implement Learner for dynamic C5 route switching
- Add SmallLearnerStatsV7 type + API to policy box
- Hook ColdIface refill/retire to collect stats (capacity-based)
- Implement C5 route switching: if C5 ratio < 30%, switch to MID_V3
- Version-based TLS cache invalidation for policy updates
- Evaluation interval: every 100 refills

Tested with c6heavy scenario: C5 ratio=12% triggers V7 → MID_V3 switch

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 05:51:27 +09:00
d5aa3110c6 Phase v7-5b: C5+C6 multi-class expansion (+4.3% improvement)
- Add C5 (256B blocks) support alongside C6 (512B blocks)
- Same segment shared between C5/C6 (page_meta.class_idx distinguishes)
- SMALL_V7_CLASS_SUPPORTED() macro for class validation
- Extend small_v7_block_size() for C5 (switch statement)

A/B Result: C6-only v7 avg 7.64M ops/s → C5+C6 v7 avg 7.97M ops/s (+4.3%)
Criteria: C6 protected , C5 net positive , TLS bloat none 

ENV: HAKMEM_SMALL_HEAP_V7_CLASSES=0x60 (bit5+bit6)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 05:11:02 +09:00
17ceed619c Phase v7-5a: Hot path stats removal (C6 v7 極限最適化)
- Remove per-page stats from hot path (alloc_count, free_count, live_current)
- Add ENV-gated global atomic stats (HAKMEM_V7_HOT_STATS)
- Stats now collected only at retire time (cold path)
- Header write kept at alloc time (freelist overlaps block[0])

A/B Result: -4.3% overhead → ±0% (target: legacy ±2%)
v7 OFF avg: 9.26M ops/s, v7 ON avg: 9.27M ops/s (+0.15%)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 04:51:17 +09:00
2bdf29a9ed Phase v7-3: TLS segment fast path optimization (RegionIdBox overhead reduction)
- SmallHeapCtx_v7: Add TLS segment hints (tls_seg_base/end) for fast bounds check
- free fast path: TLS segment hit → skip RegionIdBox binary search
- Simplified control flow: removed same-page cache (negligible benefit vs branch cost)
- Optimization: O(1) page_idx calculation via bit shift vs O(log N) RegionIdBox lookup

Performance improvement:
- Phase v7-2: 54.5M ops/s (-7.0% vs 58.6M legacy)
- Phase v7-3: 56.3M ops/s (-4.3% vs legacy)
- Overhead reduction: 38% (from -7.0% to -4.3%)

TLS segment hit path bypasses RegionIdBox for most C6 frees.
Remaining -4.3% overhead acceptable for modular v7 architecture.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-12 03:38:39 +09:00
39a3c53dbc Phase v7-2: SmallObject v7 C6-only implementation with RegionIdBox integration
- SmallSegment_v7: 2MiB segment with TLS slot and free page stack
- ColdIface_v7: Page refill/retire between HotBox and SegmentBox
- HotBox_v7: Full C6-only alloc/free with header writing (HEADER_MAGIC|class_idx)
- Free path early-exit: Check v7 route BEFORE ss_fast_lookup (separate mmap segment)
- RegionIdBox: Register v7 segment for ptr->region lookup
- Benchmark: v7 ON ~54.5M ops/s (-7% overhead vs 58.6M legacy baseline)

v7 correctly balances alloc/free counts and page lifecycle.
RegionIdBox overhead identified as primary cost driver.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-12 03:12:28 +09:00