79674c9390
Phase v10: Remove legacy v3/v4/v5 implementations
...
Removal strategy: Deprecate routes by disabling ENV-based routing
- v3/v4/v5 enum types kept for binary compatibility
- small_heap_v3/v4/v5_enabled() always return 0
- small_heap_v3/v4/v5_class_enabled() always return 0
- Any v3/v4/v5 ENVs are silently ignored, routes to LEGACY
Changes:
- core/box/smallobject_hotbox_v3_env_box.h: stub functions
- core/box/smallobject_hotbox_v4_env_box.h: stub functions
- core/box/smallobject_v5_env_box.h: stub functions
- core/front/malloc_tiny_fast.h: remove alloc/free cases (20+ lines)
Benefits:
- Cleaner routing logic (v6/v7 only for SmallObject)
- 20+ lines deleted from hot path validation
- No behavioral change (routes were rarely used)
Performance: No regression expected (v3/v4/v5 already disabled by default)
Next: Set Learner v7 default ON, production testing
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2025-12-12 06:09:12 +09:00
8789542a9f
Phase v5-7: C6 ULTRA pattern (research mode, 32-slot TLS freelist)
...
Implementation:
- ENV: HAKMEM_SMALL_HEAP_V5_ULTRA_C6_ENABLED=0|1 (default: 0)
- SmallHeapCtxV5: added c6_tls_freelist[32], c6_tls_count, ultra_c6_enabled
- small_segment_v5_owns_ptr_fast(): lightweight segment check for free path
- small_alloc_slow_v5_c6_refill(): batch TLS fill from page freelist
- small_free_slow_v5_c6_drain(): drain half of TLS to page on overflow
Performance (C6-heavy 257-768B, 2M iters, ws=400):
- v5 OFF baseline: 47M ops/s
- v5 ULTRA: 37-38M ops/s (-20%)
- vs v5 base (no opts): +3-5% improvement
Design limitation identified:
- Header write required on every alloc (freelist overwrites header byte)
- Segment validation required on every free
- page->used tracking required for retirement
- These prevent matching baseline pool v1 performance
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2025-12-11 13:32:46 +09:00
f191774c1e
Phase v5-6: TLS batching for C6 v5
...
- Add HAKMEM_SMALL_HEAP_V5_BATCH_ENABLED ENV gate (default: 0)
- Add SmallV5Batch struct with 4-slot buffer in SmallHeapCtxV5
- Integrate batch alloc/free paths (after cache, before freelist)
- Fix pre-existing build error in tiny_free_magazine.inc.h (ss_time/tss undeclared)
Benchmarks (C6 257-768B):
- Batch OFF: 36.71M ops/s → Batch ON: 37.78M ops/s (+2.9%)
- Mixed 16-1024B: batch ON 37.09M vs OFF 38.25M (-3%, within noise)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2025-12-11 12:53:03 +09:00
2f5d53fd6d
Phase v5-5: TLS cache for C6 v5
...
Add 1-slot TLS cache to C6 v5 to reduce page_meta access overhead.
Implementation:
- Add HAKMEM_SMALL_HEAP_V5_TLS_CACHE_ENABLED ENV (default: 0)
- SmallHeapCtxV5: add c6_cached_block field for TLS cache
- alloc: cache hit bypasses page_meta lookup, returns immediately
- free: empty cache stores block, full cache evicts old block first
Results (1M iter, ws=400, HEADER_MODE=full):
- C6-heavy (257-768B): 35.53M → 37.02M ops/s (+4.2%)
- Mixed 16-1024B: 38.04M → 37.93M ops/s (-0.3%, noise)
Known issue: header_mode=light has infinite loop bug
(freelist pointer/header collision). Full mode only for now.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com >
2025-12-11 07:40:22 +09:00
2a548875b8
Phase v5-4: Header light mode & freelist optimization
...
Implements header write optimization for C6 v5 allocator by moving
header initialization from per-alloc time to carve time (during page
refill). This eliminates redundant header writes on the hot path.
Implementation:
- Added HAKMEM_SMALL_HEAP_V5_HEADER_MODE ENV (full|light, default: full)
- Added header_mode field to SmallHeapCtxV5 (cached per-thread)
- Modified alloc fast/slow paths to skip header write in light mode
- Modified refill to write headers during carve in light mode
- Free path unchanged (header validation still works)
Benchmark Results (2M iterations, ws=400):
C6-HEAVY (257-768B):
- Baseline (v5 OFF): 47.95 Mops/s
- v5 full mode: 38.97 Mops/s (-18.7% vs baseline)
- v5 light mode: 39.25 Mops/s (-18.1% vs baseline, +0.7% vs full)
MIXED 16-1024B:
- v5 OFF: 43.59 Mops/s
- v5 full mode: 36.53 Mops/s (-16.2% vs OFF)
- v5 light mode: 38.04 Mops/s (-12.7% vs OFF, +4.1% vs full)
Analysis:
- Light mode shows modest improvement over full (+0.7-4.1%)
- C6 v5 performance gap vs baseline (-18%) indicates need for
further optimization beyond header writes
- Mixed workload benefits more from light mode (+4.1% vs full)
- No regressions in safety/correctness observed
Research findings:
- Header write optimization alone insufficient to close v5 gap
- Need to investigate other hot path costs (freelist ops, metadata access)
- Light mode validates the carve-time header concept
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com >
2025-12-11 05:12:39 +09:00
dedfea27d5
Phase v5-0 refactor: ENV統一・マクロ化・構造体最適化
...
- ENV initialization を sentinel パターンで統一
- ENV_UNINIT/ENABLED/DISABLED 定数追加
- __builtin_expect で初期化チェックを最適化
- small_heap_v5_enabled/class_mask を統一パターンに変更
- ポインタマクロ化(O(1) segment/page 計算)
- SMALL_SEGMENT_V5_BASE_FROM_PTR: ptr から segment base を mask で計算
- SMALL_SEGMENT_V5_PAGE_IDX: segment 内の page_idx を shift で計算
- SMALL_SEGMENT_V5_PAGE_META: page_meta への O(1) access(bounds check付き)
- SMALL_SEGMENT_V5_VALIDATE_MAGIC: magic 検証
- SMALL_SEGMENT_V5_VALIDATE_PTR: Fail-Fast validation pipeline
- SmallClassHeapV5 に partial_count 追加
- partial ページリストのカウンタを追加(refill/retire 最適化用)
- SmallPageMetaV5 の field 再配置(L1 cache 最適化)
- hot fields (free_list, used, capacity) を先頭に集約
- metadata (class_idx, flags, page_idx, segment) を後方配置
- total 24B、offset コメント追加
- route priority ENV 追加
- HAKMEM_ROUTE_PRIORITY={v4|v5|auto}(default: v4)
- enum small_route_priority 定義
- small_route_priority() 関数追加
- segment_size override ENV 追加
- HAKMEM_SMALL_HEAP_V5_SEGMENT_SIZE(default: 2MiB)
- power of 2 & >= 64KiB validation
挙動: 完全不変(v5 route は呼ばれない、ENV default OFF)
テスト: Mixed 16–1024B で 43.0–43.8M ops/s(変化なし)、SEGV/assert なし
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com >
2025-12-11 03:19:18 +09:00
83d4096fbc
Phase v5-0: SmallObject v5 の設計・型/IF/ENV スケルトン追加
...
設計ドキュメント:
- docs/analysis/SMALLOBJECT_V5_DESIGN.md: v5 アーキテクチャ全体設計
新規ファイル (v5 スケルトン):
- core/box/smallobject_hotbox_v5_box.h: HotBox v5 型定義
- core/box/smallsegment_v5_box.h: Segment v5 型定義
- core/box/smallobject_cold_iface_v5.h: ColdIface v5 IF宣言
- core/box/smallobject_v5_env_box.h: ENV ゲート
- core/smallobject_hotbox_v5.c: 実装 stub (完全 fallback)
特徴:
✅ 型とインターフェースのみ定義(v5-0 は機能なし)
✅ ENV デフォルト OFF(HAKMEM_SMALL_HEAP_V5_ENABLED=0)
✅ 挙動完全不変(Mixed/C6 benchmark 確認済み)
✅ v4 との区別を明確化 (*_v5 suffix)
✅ v5-1 (stub) → v5-2 (本実装) → v5-3 (Mixed) への段階実装準備完了
フェーズ:
- v5-0: 型定義のみ(現在)
- v5-1: C6-only stub route 追加
- v5-2: Segment/HotBox 本実装 (C6-only bench A/B)
- v5-3: Mixed での段階昇格 (C6 → C5 → ...)
目標性能: Mixed 16–1024B で 50–60M ops/s (mimalloc の 5割)
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com >
2025-12-11 03:09:57 +09:00