2025-12-11 03:09:57 +09:00
|
|
|
|
# SmallObject HotBox v5 設計ドキュメント
|
|
|
|
|
|
|
|
|
|
|
|
## 目的
|
|
|
|
|
|
|
|
|
|
|
|
16〜2KiB 帯の small-object/mid を **SmallObjectHotBox_v5** に集約し、Mixed 16–1024B を **mimalloc の 5割(50〜60M ops/s)** クラスに寄せる。
|
|
|
|
|
|
|
|
|
|
|
|
v4 は「TinyHeap 依存 + 重い page 管理」の反省対象として archive。C5/C6 は v4 ではなく **v5 に乗せる**。
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 箱構造
|
|
|
|
|
|
|
|
|
|
|
|
### Hot Path: SmallObjectHotBox_v5
|
|
|
|
|
|
- **型**: `SmallHeapCtxV5` (per-thread)
|
|
|
|
|
|
- **状態**: `SmallClassHeapV5 cls[NUM_SMALL_CLASSES_V5]` (current/partial/full リスト)
|
|
|
|
|
|
- **特徴**: ptr→page→class を O(1) で判定、mid_desc_lookup / hak_super_lookup / classify_ptr を呼ばない
|
|
|
|
|
|
|
|
|
|
|
|
### Cold Path: SmallColdIface_v5
|
|
|
|
|
|
- `small_cold_v5_refill_page()`: ページ割当
|
|
|
|
|
|
- `small_cold_v5_retire_page()`: ページ返却
|
|
|
|
|
|
- `small_cold_v5_remote_push()`: リモート free
|
|
|
|
|
|
- `small_cold_v5_remote_drain()`: バッチ回収
|
|
|
|
|
|
|
|
|
|
|
|
### Segment: SmallSegmentBox_v5
|
|
|
|
|
|
- **構成**: 2MiB Segment / 64KiB Page
|
|
|
|
|
|
- **ページメタ**: `SmallPageMetaV5 page_meta[]` (class_idx/used/capacity/freelist を直接保持)
|
|
|
|
|
|
- **O(1) lookup**: Segment mask + page_idx で `page_meta` に直接アクセス
|
|
|
|
|
|
- **API**:
|
|
|
|
|
|
- `small_segment_v5_acquire()`: セグメント確保
|
|
|
|
|
|
- `small_segment_v5_page_meta_of()`: ptr → page_meta
|
|
|
|
|
|
|
|
|
|
|
|
### Policy/Learning: SmallPolicySnapshot_v5
|
|
|
|
|
|
- `route_kind` (TINY, SMALL_HEAP_V5, POOL_V1 など)
|
|
|
|
|
|
- `block_size` (各サイズクラスの実サイズ)
|
|
|
|
|
|
- `max_partial_pages` (partial リスト上限)
|
|
|
|
|
|
- Route/Policy 変更時に snapshot を再計算(lazy initialization)
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 設計の芯
|
|
|
|
|
|
|
|
|
|
|
|
### 1. ptr→page→class の O(1) 判定
|
|
|
|
|
|
```c
|
|
|
|
|
|
// SmallSegment mask計算
|
|
|
|
|
|
SmallPageMetaV5* meta = small_segment_v5_page_meta_of(ptr);
|
|
|
|
|
|
// ↓ Segment base + (ptr - base) / PAGE_SIZE で直接インデックス計算
|
|
|
|
|
|
```
|
|
|
|
|
|
- TinyHeap lookup を呼ばない
|
|
|
|
|
|
- mid_desc_lookup / hak_super_lookup を呼ばない
|
|
|
|
|
|
- SmallSegment が所有するページなら即座に class_idx 取得可能
|
|
|
|
|
|
|
|
|
|
|
|
### 2. Hot/Cold の分離
|
|
|
|
|
|
- **Hot**: current/partial リスト + TLS freelist (将来)
|
|
|
|
|
|
- **Cold**: SmallSegment ページプール + remote push/drain
|
|
|
|
|
|
- C7 ULTRA は L0 lane として維持、v5 は影響されず動く
|
|
|
|
|
|
|
|
|
|
|
|
### 3. C7 ULTRA との共存
|
|
|
|
|
|
- C7 ULTRA は「超ホットクラス専用 lane」として後段フェーズで検討
|
|
|
|
|
|
- v5 は C7 ULTRA に依存しない(ULTRA が OFF でも v5 は動く)
|
|
|
|
|
|
- Segment/Policy は共有可能(内部実装は後で詰める)
|
|
|
|
|
|
|
|
|
|
|
|
### 4. クラス対象
|
|
|
|
|
|
- **初期**: C6(257–768B)→ C5(129–256B)
|
|
|
|
|
|
- **将来**: C4(65–128B)以下
|
|
|
|
|
|
- C7(1024B)は ULTRA lane で十分だが、必要に応じて v5 也の optimize lane も検討
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## フェーズ案
|
|
|
|
|
|
|
|
|
|
|
|
### Phase v5-0: 型・IF・ENV のみ(完全 OFF)
|
|
|
|
|
|
- `SmallObjectHotBox_v5_box.h`: 型定義
|
|
|
|
|
|
- `SmallSegmentBox_v5.h`: Segment 構造定義
|
|
|
|
|
|
- `SmallColdIface_v5.h`: Cold function 宣言(stub)
|
|
|
|
|
|
- `SmallObjectV5_env_box.h`: ENV ゲート(HAKMEM_SMALL_HEAP_V5_ENABLED=0 デフォルト)
|
|
|
|
|
|
- `smallobject_hotbox_v5.c`: HotBox 実装 stub(fallback)
|
|
|
|
|
|
- **挙動**: 完全不変、v5 route は呼ばれない
|
|
|
|
|
|
|
|
|
|
|
|
### Phase v5-1: C6-only v5 route stub(front 経路だけ通す)
|
|
|
|
|
|
- tiny_route に `TINY_ROUTE_SMALL_HEAP_V5` 追加
|
|
|
|
|
|
- ENV で `HAKMEM_SMALL_HEAP_V5_ENABLED=1 HAKMEM_SMALL_HEAP_V5_CLASSES=0x40` で C6 を v5 route に
|
|
|
|
|
|
- 中身は v1/pool fallback → v5-0 段階での A/B(route 経由は OK か確認)
|
|
|
|
|
|
|
|
|
|
|
|
### Phase v5-2: C6-only v5 本実装(Segment + Page + TLS freelist)
|
Phase v6-1/2/3/4: SmallObject Core v6 - C6-only implementation + refactor
Phase v6-1: C6-only route stub (v1/pool fallback)
Phase v6-2: Segment v6 + ColdIface v6 + Core v6 HotPath implementation
- 2MiB segment / 64KiB page allocation
- O(1) ptr→page_meta lookup with segment masking
- C6-heavy A/B: SEGV-free but -44% performance (15.3M ops/s)
Phase v6-3: Thin-layer optimization (TLS ownership check + batch header + refill batching)
- TLS ownership fast-path skip page_meta for 90%+ of frees
- Batch header writes during refill (32 allocs = 1 header write)
- TLS batch refill (1/32 refill frequency)
- C6-heavy A/B: v6-2 15.3M → v6-3 27.1M ops/s (±0% vs baseline) ✅
Phase v6-4: Mixed hang fix (segment metadata lookup correction)
- Root cause: metadata lookup was reading mmap region instead of TLS slot
- Fix: use TLS slot descriptor with in_use validation
- Mixed health: 5M iterations SEGV-free, 35.8M ops/s ✅
Phase v6-refactor: Code quality improvements (macro unification + inline + docs)
- Add SMALL_V6_* prefix macros (header, pointer conversion, page index)
- Extract inline validation functions (small_page_v6_valid, small_ptr_in_segment_v6)
- Doxygen-style comments for all public functions
- Result: 0 compiler warnings, maintained +1.2% performance
Files:
- core/box/smallobject_core_v6_box.h (new, type & API definitions)
- core/box/smallobject_cold_iface_v6.h (new, cold iface API)
- core/box/smallsegment_v6_box.h (new, segment type definitions)
- core/smallobject_core_v6.c (new, C6 alloc/free implementation)
- core/smallobject_cold_iface_v6.c (new, refill/retire logic)
- core/smallsegment_v6.c (new, segment allocator)
- docs/analysis/SMALLOBJECT_CORE_V6_DESIGN.md (new, design document)
- core/box/tiny_route_env_box.h (modified, v6 route added)
- core/front/malloc_tiny_fast.h (modified, v6 case in route switch)
- Makefile (modified, v6 objects added)
- CURRENT_TASK.md (modified, v6 status added)
Status:
- C6-heavy: v6 OFF 27.1M → v6-3 ON 27.1M ops/s (±0%) ✅
- Mixed: v6 ON 35.8M ops/s (C6-only, other classes via v1) ✅
- Build: 0 warnings, fully documented ✅
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 15:29:59 +09:00
|
|
|
|
- SmallSegment v5 の割当・ページ carve 実装。
|
|
|
|
|
|
- SmallHeapCtx v5 の alloc/free 実装。
|
|
|
|
|
|
- C6-heavy ベンチで v1 と A/B。初期版 v5-2 は ~14.7M ops/s と大きく遅く、その後 v5-3 で薄型化。
|
|
|
|
|
|
|
|
|
|
|
|
### Phase v5-3: C6-only v5 薄型化(HotPath 整理)
|
|
|
|
|
|
- C6 v5 を対象に HotPath を薄型化し、v5-2 の O(n) 成分を削る。
|
|
|
|
|
|
- 単一 TLS セグメント+mask/shift による O(1) `page_meta_of` を採用。
|
|
|
|
|
|
- free page 検索はビットマップ+`__builtin_ctz()` で O(1) に。
|
|
|
|
|
|
- partial list を最小限(例: 1ページ)に抑え、current/partial のリスト走査を削減。
|
|
|
|
|
|
- C6-heavy 1M/400 では v5-2 の ~14.7M ops/s から ~38.5M ops/s まで改善(ただし v1 baseline ~44.9M よりはまだ遅い)。
|
|
|
|
|
|
|
|
|
|
|
|
### Phase v5-4: C6 v5 header light / freelist 微調整(研究箱)
|
|
|
|
|
|
- 目的: C6-heavy 1M/400 で v5 ON 時の回帰を baseline 比 -5〜7% 程度まで縮める(現状は約 -14%)。
|
|
|
|
|
|
- `HAKMEM_SMALL_HEAP_V5_HEADER_MODE=full|light` を導入し、light 時は:
|
|
|
|
|
|
- page carve 時にだけ `tiny_region_id_write_header` でヘッダを書き込む。
|
|
|
|
|
|
- `small_alloc_fast_v5` では per-alloc のヘッダ再書き込みを行わない(free 側の検証は従来どおりヘッダを読むだけ)。
|
|
|
|
|
|
- C6 v5 の freelist 操作から余分な memcpy/二重読み書きを削り、単純な SLL push/pop に揃える(TLS 構造は追加しない)。
|
|
|
|
|
|
- 実測: C6-heavy では v5 full 38.97M → v5 light 39.25M(+0.7%)だが、v5 OFF baseline ~47.95M に対しては依然大きな回帰。Mixed でも v5 light は baseline 比で -13% 程度。
|
|
|
|
|
|
|
|
|
|
|
|
### Phase v5-5: C6 v5 TLS cache(研究箱)
|
|
|
|
|
|
- 目的: C6 v5 の HotPath から page_meta access を削減し、+1〜2% 程度の改善を狙う(研究箱)。
|
|
|
|
|
|
- `HAKMEM_SMALL_HEAP_V5_TLS_CACHE_ENABLED=0|1` を導入し、SmallHeapCtxV5 に C6 用 1 スロットの TLS cache (`c6_cached_block` など) を追加。
|
|
|
|
|
|
- alloc: cache hit 時は page_meta に触らずに block を返す。cache miss 時は既存の page freelist パスにフォールバック。
|
|
|
|
|
|
- free: cache が空なら block を cache に格納、満杯なら既存の freelist パスに流す。
|
|
|
|
|
|
- 実測(1M/400, HEADER_MODE=full, v5 ON):
|
|
|
|
|
|
- C6-heavy (257–768B): cache OFF 35.53M → cache ON 37.02M ops/s(+4.2%)
|
|
|
|
|
|
- Mixed 16–1024B: cache OFF 38.04M → cache ON 37.93M ops/s(-0.3%, 誤差範囲)
|
|
|
|
|
|
- header light + cache の組み合わせでは freelist/header 衝突によるループが確認されており、現時点では「header full + cache」のみ動作保証。v5 は引き続き研究箱のままで、本線 mid/smallmid は pool v1 基準で見る。
|
|
|
|
|
|
|
|
|
|
|
|
### Phase v5-6: C6 v5 TLS batching(設計完了・実装待ち)
|
|
|
|
|
|
|
|
|
|
|
|
**目的**: refill 頻度を削減し、C6-heavy で v5 full+cache 比 **+3〜5%** の追加改善を狙う(研究箱)。
|
|
|
|
|
|
|
|
|
|
|
|
**ENV ゲート**:
|
|
|
|
|
|
```c
|
|
|
|
|
|
// smallobject_v5_env_box.h に追加
|
|
|
|
|
|
HAKMEM_SMALL_HEAP_V5_BATCH_ENABLED=0|1 // デフォルト 0
|
|
|
|
|
|
HAKMEM_SMALL_HEAP_V5_BATCH_SIZE=N // デフォルト 4
|
|
|
|
|
|
```
|
2025-12-11 03:09:57 +09:00
|
|
|
|
|
Phase v6-1/2/3/4: SmallObject Core v6 - C6-only implementation + refactor
Phase v6-1: C6-only route stub (v1/pool fallback)
Phase v6-2: Segment v6 + ColdIface v6 + Core v6 HotPath implementation
- 2MiB segment / 64KiB page allocation
- O(1) ptr→page_meta lookup with segment masking
- C6-heavy A/B: SEGV-free but -44% performance (15.3M ops/s)
Phase v6-3: Thin-layer optimization (TLS ownership check + batch header + refill batching)
- TLS ownership fast-path skip page_meta for 90%+ of frees
- Batch header writes during refill (32 allocs = 1 header write)
- TLS batch refill (1/32 refill frequency)
- C6-heavy A/B: v6-2 15.3M → v6-3 27.1M ops/s (±0% vs baseline) ✅
Phase v6-4: Mixed hang fix (segment metadata lookup correction)
- Root cause: metadata lookup was reading mmap region instead of TLS slot
- Fix: use TLS slot descriptor with in_use validation
- Mixed health: 5M iterations SEGV-free, 35.8M ops/s ✅
Phase v6-refactor: Code quality improvements (macro unification + inline + docs)
- Add SMALL_V6_* prefix macros (header, pointer conversion, page index)
- Extract inline validation functions (small_page_v6_valid, small_ptr_in_segment_v6)
- Doxygen-style comments for all public functions
- Result: 0 compiler warnings, maintained +1.2% performance
Files:
- core/box/smallobject_core_v6_box.h (new, type & API definitions)
- core/box/smallobject_cold_iface_v6.h (new, cold iface API)
- core/box/smallsegment_v6_box.h (new, segment type definitions)
- core/smallobject_core_v6.c (new, C6 alloc/free implementation)
- core/smallobject_cold_iface_v6.c (new, refill/retire logic)
- core/smallsegment_v6.c (new, segment allocator)
- docs/analysis/SMALLOBJECT_CORE_V6_DESIGN.md (new, design document)
- core/box/tiny_route_env_box.h (modified, v6 route added)
- core/front/malloc_tiny_fast.h (modified, v6 case in route switch)
- Makefile (modified, v6 objects added)
- CURRENT_TASK.md (modified, v6 status added)
Status:
- C6-heavy: v6 OFF 27.1M → v6-3 ON 27.1M ops/s (±0%) ✅
- Mixed: v6 ON 35.8M ops/s (C6-only, other classes via v1) ✅
- Build: 0 warnings, fully documented ✅
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 15:29:59 +09:00
|
|
|
|
**バッチ構造**:
|
|
|
|
|
|
```c
|
|
|
|
|
|
#define SMALL_V5_BATCH_CAP 4
|
|
|
|
|
|
|
|
|
|
|
|
typedef struct SmallV5Batch {
|
|
|
|
|
|
void* slots[SMALL_V5_BATCH_CAP]; // BASE pointer 格納
|
|
|
|
|
|
uint8_t count; // 現在バッチ内のブロック数
|
|
|
|
|
|
} SmallV5Batch;
|
|
|
|
|
|
|
|
|
|
|
|
typedef struct SmallHeapCtxV5 {
|
|
|
|
|
|
SmallClassHeapV5 cls[NUM_SMALL_CLASSES_V5];
|
|
|
|
|
|
uint8_t header_mode;
|
|
|
|
|
|
bool tls_cache_enabled;
|
|
|
|
|
|
void* c6_cached_block; // v5-5 TLS cache (1-slot)
|
|
|
|
|
|
bool batch_enabled;
|
|
|
|
|
|
SmallV5Batch c6_batch; // v5-6 TLS batch (4-slot)
|
|
|
|
|
|
} SmallHeapCtxV5;
|
|
|
|
|
|
```
|
2025-12-11 03:09:57 +09:00
|
|
|
|
|
Phase v6-1/2/3/4: SmallObject Core v6 - C6-only implementation + refactor
Phase v6-1: C6-only route stub (v1/pool fallback)
Phase v6-2: Segment v6 + ColdIface v6 + Core v6 HotPath implementation
- 2MiB segment / 64KiB page allocation
- O(1) ptr→page_meta lookup with segment masking
- C6-heavy A/B: SEGV-free but -44% performance (15.3M ops/s)
Phase v6-3: Thin-layer optimization (TLS ownership check + batch header + refill batching)
- TLS ownership fast-path skip page_meta for 90%+ of frees
- Batch header writes during refill (32 allocs = 1 header write)
- TLS batch refill (1/32 refill frequency)
- C6-heavy A/B: v6-2 15.3M → v6-3 27.1M ops/s (±0% vs baseline) ✅
Phase v6-4: Mixed hang fix (segment metadata lookup correction)
- Root cause: metadata lookup was reading mmap region instead of TLS slot
- Fix: use TLS slot descriptor with in_use validation
- Mixed health: 5M iterations SEGV-free, 35.8M ops/s ✅
Phase v6-refactor: Code quality improvements (macro unification + inline + docs)
- Add SMALL_V6_* prefix macros (header, pointer conversion, page index)
- Extract inline validation functions (small_page_v6_valid, small_ptr_in_segment_v6)
- Doxygen-style comments for all public functions
- Result: 0 compiler warnings, maintained +1.2% performance
Files:
- core/box/smallobject_core_v6_box.h (new, type & API definitions)
- core/box/smallobject_cold_iface_v6.h (new, cold iface API)
- core/box/smallsegment_v6_box.h (new, segment type definitions)
- core/smallobject_core_v6.c (new, C6 alloc/free implementation)
- core/smallobject_cold_iface_v6.c (new, refill/retire logic)
- core/smallsegment_v6.c (new, segment allocator)
- docs/analysis/SMALLOBJECT_CORE_V6_DESIGN.md (new, design document)
- core/box/tiny_route_env_box.h (modified, v6 route added)
- core/front/malloc_tiny_fast.h (modified, v6 case in route switch)
- Makefile (modified, v6 objects added)
- CURRENT_TASK.md (modified, v6 status added)
Status:
- C6-heavy: v6 OFF 27.1M → v6-3 ON 27.1M ops/s (±0%) ✅
- Mixed: v6 ON 35.8M ops/s (C6-only, other classes via v1) ✅
- Build: 0 warnings, fully documented ✅
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 15:29:59 +09:00
|
|
|
|
**alloc パス設計**(優先順位):
|
|
|
|
|
|
```
|
|
|
|
|
|
1. TLS cache hit (c6_cached_block != NULL)
|
|
|
|
|
|
→ 即返す(page_meta 触らない)
|
2025-12-11 03:09:57 +09:00
|
|
|
|
|
Phase v6-1/2/3/4: SmallObject Core v6 - C6-only implementation + refactor
Phase v6-1: C6-only route stub (v1/pool fallback)
Phase v6-2: Segment v6 + ColdIface v6 + Core v6 HotPath implementation
- 2MiB segment / 64KiB page allocation
- O(1) ptr→page_meta lookup with segment masking
- C6-heavy A/B: SEGV-free but -44% performance (15.3M ops/s)
Phase v6-3: Thin-layer optimization (TLS ownership check + batch header + refill batching)
- TLS ownership fast-path skip page_meta for 90%+ of frees
- Batch header writes during refill (32 allocs = 1 header write)
- TLS batch refill (1/32 refill frequency)
- C6-heavy A/B: v6-2 15.3M → v6-3 27.1M ops/s (±0% vs baseline) ✅
Phase v6-4: Mixed hang fix (segment metadata lookup correction)
- Root cause: metadata lookup was reading mmap region instead of TLS slot
- Fix: use TLS slot descriptor with in_use validation
- Mixed health: 5M iterations SEGV-free, 35.8M ops/s ✅
Phase v6-refactor: Code quality improvements (macro unification + inline + docs)
- Add SMALL_V6_* prefix macros (header, pointer conversion, page index)
- Extract inline validation functions (small_page_v6_valid, small_ptr_in_segment_v6)
- Doxygen-style comments for all public functions
- Result: 0 compiler warnings, maintained +1.2% performance
Files:
- core/box/smallobject_core_v6_box.h (new, type & API definitions)
- core/box/smallobject_cold_iface_v6.h (new, cold iface API)
- core/box/smallsegment_v6_box.h (new, segment type definitions)
- core/smallobject_core_v6.c (new, C6 alloc/free implementation)
- core/smallobject_cold_iface_v6.c (new, refill/retire logic)
- core/smallsegment_v6.c (new, segment allocator)
- docs/analysis/SMALLOBJECT_CORE_V6_DESIGN.md (new, design document)
- core/box/tiny_route_env_box.h (modified, v6 route added)
- core/front/malloc_tiny_fast.h (modified, v6 case in route switch)
- Makefile (modified, v6 objects added)
- CURRENT_TASK.md (modified, v6 status added)
Status:
- C6-heavy: v6 OFF 27.1M → v6-3 ON 27.1M ops/s (±0%) ✅
- Mixed: v6 ON 35.8M ops/s (C6-only, other classes via v1) ✅
- Build: 0 warnings, fully documented ✅
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 15:29:59 +09:00
|
|
|
|
2. batch_enabled && c6_batch.count > 0
|
|
|
|
|
|
→ --count; return slots[count];(page_meta 触らない)
|
2025-12-11 03:09:57 +09:00
|
|
|
|
|
Phase v6-1/2/3/4: SmallObject Core v6 - C6-only implementation + refactor
Phase v6-1: C6-only route stub (v1/pool fallback)
Phase v6-2: Segment v6 + ColdIface v6 + Core v6 HotPath implementation
- 2MiB segment / 64KiB page allocation
- O(1) ptr→page_meta lookup with segment masking
- C6-heavy A/B: SEGV-free but -44% performance (15.3M ops/s)
Phase v6-3: Thin-layer optimization (TLS ownership check + batch header + refill batching)
- TLS ownership fast-path skip page_meta for 90%+ of frees
- Batch header writes during refill (32 allocs = 1 header write)
- TLS batch refill (1/32 refill frequency)
- C6-heavy A/B: v6-2 15.3M → v6-3 27.1M ops/s (±0% vs baseline) ✅
Phase v6-4: Mixed hang fix (segment metadata lookup correction)
- Root cause: metadata lookup was reading mmap region instead of TLS slot
- Fix: use TLS slot descriptor with in_use validation
- Mixed health: 5M iterations SEGV-free, 35.8M ops/s ✅
Phase v6-refactor: Code quality improvements (macro unification + inline + docs)
- Add SMALL_V6_* prefix macros (header, pointer conversion, page index)
- Extract inline validation functions (small_page_v6_valid, small_ptr_in_segment_v6)
- Doxygen-style comments for all public functions
- Result: 0 compiler warnings, maintained +1.2% performance
Files:
- core/box/smallobject_core_v6_box.h (new, type & API definitions)
- core/box/smallobject_cold_iface_v6.h (new, cold iface API)
- core/box/smallsegment_v6_box.h (new, segment type definitions)
- core/smallobject_core_v6.c (new, C6 alloc/free implementation)
- core/smallobject_cold_iface_v6.c (new, refill/retire logic)
- core/smallsegment_v6.c (new, segment allocator)
- docs/analysis/SMALLOBJECT_CORE_V6_DESIGN.md (new, design document)
- core/box/tiny_route_env_box.h (modified, v6 route added)
- core/front/malloc_tiny_fast.h (modified, v6 case in route switch)
- Makefile (modified, v6 objects added)
- CURRENT_TASK.md (modified, v6 status added)
Status:
- C6-heavy: v6 OFF 27.1M → v6-3 ON 27.1M ops/s (±0%) ✅
- Mixed: v6 ON 35.8M ops/s (C6-only, other classes via v1) ✅
- Build: 0 warnings, fully documented ✅
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 15:29:59 +09:00
|
|
|
|
3. 既存の page freelist / refill パス
|
|
|
|
|
|
→ page_meta->free_list から pop
|
|
|
|
|
|
→ 空なら alloc_slow_v5() で refill
|
|
|
|
|
|
```
|
2025-12-11 03:09:57 +09:00
|
|
|
|
|
Phase v6-1/2/3/4: SmallObject Core v6 - C6-only implementation + refactor
Phase v6-1: C6-only route stub (v1/pool fallback)
Phase v6-2: Segment v6 + ColdIface v6 + Core v6 HotPath implementation
- 2MiB segment / 64KiB page allocation
- O(1) ptr→page_meta lookup with segment masking
- C6-heavy A/B: SEGV-free but -44% performance (15.3M ops/s)
Phase v6-3: Thin-layer optimization (TLS ownership check + batch header + refill batching)
- TLS ownership fast-path skip page_meta for 90%+ of frees
- Batch header writes during refill (32 allocs = 1 header write)
- TLS batch refill (1/32 refill frequency)
- C6-heavy A/B: v6-2 15.3M → v6-3 27.1M ops/s (±0% vs baseline) ✅
Phase v6-4: Mixed hang fix (segment metadata lookup correction)
- Root cause: metadata lookup was reading mmap region instead of TLS slot
- Fix: use TLS slot descriptor with in_use validation
- Mixed health: 5M iterations SEGV-free, 35.8M ops/s ✅
Phase v6-refactor: Code quality improvements (macro unification + inline + docs)
- Add SMALL_V6_* prefix macros (header, pointer conversion, page index)
- Extract inline validation functions (small_page_v6_valid, small_ptr_in_segment_v6)
- Doxygen-style comments for all public functions
- Result: 0 compiler warnings, maintained +1.2% performance
Files:
- core/box/smallobject_core_v6_box.h (new, type & API definitions)
- core/box/smallobject_cold_iface_v6.h (new, cold iface API)
- core/box/smallsegment_v6_box.h (new, segment type definitions)
- core/smallobject_core_v6.c (new, C6 alloc/free implementation)
- core/smallobject_cold_iface_v6.c (new, refill/retire logic)
- core/smallsegment_v6.c (new, segment allocator)
- docs/analysis/SMALLOBJECT_CORE_V6_DESIGN.md (new, design document)
- core/box/tiny_route_env_box.h (modified, v6 route added)
- core/front/malloc_tiny_fast.h (modified, v6 case in route switch)
- Makefile (modified, v6 objects added)
- CURRENT_TASK.md (modified, v6 status added)
Status:
- C6-heavy: v6 OFF 27.1M → v6-3 ON 27.1M ops/s (±0%) ✅
- Mixed: v6 ON 35.8M ops/s (C6-only, other classes via v1) ✅
- Build: 0 warnings, fully documented ✅
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 15:29:59 +09:00
|
|
|
|
**free パス設計**(優先順位):
|
|
|
|
|
|
```
|
|
|
|
|
|
1. header/magic チェック + page_meta_of(ptr) で page 取得
|
2025-12-11 03:09:57 +09:00
|
|
|
|
|
Phase v6-1/2/3/4: SmallObject Core v6 - C6-only implementation + refactor
Phase v6-1: C6-only route stub (v1/pool fallback)
Phase v6-2: Segment v6 + ColdIface v6 + Core v6 HotPath implementation
- 2MiB segment / 64KiB page allocation
- O(1) ptr→page_meta lookup with segment masking
- C6-heavy A/B: SEGV-free but -44% performance (15.3M ops/s)
Phase v6-3: Thin-layer optimization (TLS ownership check + batch header + refill batching)
- TLS ownership fast-path skip page_meta for 90%+ of frees
- Batch header writes during refill (32 allocs = 1 header write)
- TLS batch refill (1/32 refill frequency)
- C6-heavy A/B: v6-2 15.3M → v6-3 27.1M ops/s (±0% vs baseline) ✅
Phase v6-4: Mixed hang fix (segment metadata lookup correction)
- Root cause: metadata lookup was reading mmap region instead of TLS slot
- Fix: use TLS slot descriptor with in_use validation
- Mixed health: 5M iterations SEGV-free, 35.8M ops/s ✅
Phase v6-refactor: Code quality improvements (macro unification + inline + docs)
- Add SMALL_V6_* prefix macros (header, pointer conversion, page index)
- Extract inline validation functions (small_page_v6_valid, small_ptr_in_segment_v6)
- Doxygen-style comments for all public functions
- Result: 0 compiler warnings, maintained +1.2% performance
Files:
- core/box/smallobject_core_v6_box.h (new, type & API definitions)
- core/box/smallobject_cold_iface_v6.h (new, cold iface API)
- core/box/smallsegment_v6_box.h (new, segment type definitions)
- core/smallobject_core_v6.c (new, C6 alloc/free implementation)
- core/smallobject_cold_iface_v6.c (new, refill/retire logic)
- core/smallsegment_v6.c (new, segment allocator)
- docs/analysis/SMALLOBJECT_CORE_V6_DESIGN.md (new, design document)
- core/box/tiny_route_env_box.h (modified, v6 route added)
- core/front/malloc_tiny_fast.h (modified, v6 case in route switch)
- Makefile (modified, v6 objects added)
- CURRENT_TASK.md (modified, v6 status added)
Status:
- C6-heavy: v6 OFF 27.1M → v6-3 ON 27.1M ops/s (±0%) ✅
- Mixed: v6 ON 35.8M ops/s (C6-only, other classes via v1) ✅
- Build: 0 warnings, fully documented ✅
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 15:29:59 +09:00
|
|
|
|
2. TLS cache 空なら cache に格納(v5-5 既存)
|
|
|
|
|
|
→ return
|
2025-12-11 03:09:57 +09:00
|
|
|
|
|
Phase v6-1/2/3/4: SmallObject Core v6 - C6-only implementation + refactor
Phase v6-1: C6-only route stub (v1/pool fallback)
Phase v6-2: Segment v6 + ColdIface v6 + Core v6 HotPath implementation
- 2MiB segment / 64KiB page allocation
- O(1) ptr→page_meta lookup with segment masking
- C6-heavy A/B: SEGV-free but -44% performance (15.3M ops/s)
Phase v6-3: Thin-layer optimization (TLS ownership check + batch header + refill batching)
- TLS ownership fast-path skip page_meta for 90%+ of frees
- Batch header writes during refill (32 allocs = 1 header write)
- TLS batch refill (1/32 refill frequency)
- C6-heavy A/B: v6-2 15.3M → v6-3 27.1M ops/s (±0% vs baseline) ✅
Phase v6-4: Mixed hang fix (segment metadata lookup correction)
- Root cause: metadata lookup was reading mmap region instead of TLS slot
- Fix: use TLS slot descriptor with in_use validation
- Mixed health: 5M iterations SEGV-free, 35.8M ops/s ✅
Phase v6-refactor: Code quality improvements (macro unification + inline + docs)
- Add SMALL_V6_* prefix macros (header, pointer conversion, page index)
- Extract inline validation functions (small_page_v6_valid, small_ptr_in_segment_v6)
- Doxygen-style comments for all public functions
- Result: 0 compiler warnings, maintained +1.2% performance
Files:
- core/box/smallobject_core_v6_box.h (new, type & API definitions)
- core/box/smallobject_cold_iface_v6.h (new, cold iface API)
- core/box/smallsegment_v6_box.h (new, segment type definitions)
- core/smallobject_core_v6.c (new, C6 alloc/free implementation)
- core/smallobject_cold_iface_v6.c (new, refill/retire logic)
- core/smallsegment_v6.c (new, segment allocator)
- docs/analysis/SMALLOBJECT_CORE_V6_DESIGN.md (new, design document)
- core/box/tiny_route_env_box.h (modified, v6 route added)
- core/front/malloc_tiny_fast.h (modified, v6 case in route switch)
- Makefile (modified, v6 objects added)
- CURRENT_TASK.md (modified, v6 status added)
Status:
- C6-heavy: v6 OFF 27.1M → v6-3 ON 27.1M ops/s (±0%) ✅
- Mixed: v6 ON 35.8M ops/s (C6-only, other classes via v1) ✅
- Build: 0 warnings, fully documented ✅
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 15:29:59 +09:00
|
|
|
|
3. batch_enabled && c6_batch.count < SMALL_V5_BATCH_CAP
|
|
|
|
|
|
→ slots[count++] = ptr; return;
|
|
|
|
|
|
→ page->used は更新しない(batch 内は "hot reserved" 扱い)
|
2025-12-11 03:09:57 +09:00
|
|
|
|
|
Phase v6-1/2/3/4: SmallObject Core v6 - C6-only implementation + refactor
Phase v6-1: C6-only route stub (v1/pool fallback)
Phase v6-2: Segment v6 + ColdIface v6 + Core v6 HotPath implementation
- 2MiB segment / 64KiB page allocation
- O(1) ptr→page_meta lookup with segment masking
- C6-heavy A/B: SEGV-free but -44% performance (15.3M ops/s)
Phase v6-3: Thin-layer optimization (TLS ownership check + batch header + refill batching)
- TLS ownership fast-path skip page_meta for 90%+ of frees
- Batch header writes during refill (32 allocs = 1 header write)
- TLS batch refill (1/32 refill frequency)
- C6-heavy A/B: v6-2 15.3M → v6-3 27.1M ops/s (±0% vs baseline) ✅
Phase v6-4: Mixed hang fix (segment metadata lookup correction)
- Root cause: metadata lookup was reading mmap region instead of TLS slot
- Fix: use TLS slot descriptor with in_use validation
- Mixed health: 5M iterations SEGV-free, 35.8M ops/s ✅
Phase v6-refactor: Code quality improvements (macro unification + inline + docs)
- Add SMALL_V6_* prefix macros (header, pointer conversion, page index)
- Extract inline validation functions (small_page_v6_valid, small_ptr_in_segment_v6)
- Doxygen-style comments for all public functions
- Result: 0 compiler warnings, maintained +1.2% performance
Files:
- core/box/smallobject_core_v6_box.h (new, type & API definitions)
- core/box/smallobject_cold_iface_v6.h (new, cold iface API)
- core/box/smallsegment_v6_box.h (new, segment type definitions)
- core/smallobject_core_v6.c (new, C6 alloc/free implementation)
- core/smallobject_cold_iface_v6.c (new, refill/retire logic)
- core/smallsegment_v6.c (new, segment allocator)
- docs/analysis/SMALLOBJECT_CORE_V6_DESIGN.md (new, design document)
- core/box/tiny_route_env_box.h (modified, v6 route added)
- core/front/malloc_tiny_fast.h (modified, v6 case in route switch)
- Makefile (modified, v6 objects added)
- CURRENT_TASK.md (modified, v6 status added)
Status:
- C6-heavy: v6 OFF 27.1M → v6-3 ON 27.1M ops/s (±0%) ✅
- Mixed: v6 ON 35.8M ops/s (C6-only, other classes via v1) ✅
- Build: 0 warnings, fully documented ✅
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 15:29:59 +09:00
|
|
|
|
4. batch 満杯 → 既存 freelist push パス
|
|
|
|
|
|
→ page->used--; list transition logic
|
|
|
|
|
|
```
|
2025-12-11 03:09:57 +09:00
|
|
|
|
|
Phase v6-1/2/3/4: SmallObject Core v6 - C6-only implementation + refactor
Phase v6-1: C6-only route stub (v1/pool fallback)
Phase v6-2: Segment v6 + ColdIface v6 + Core v6 HotPath implementation
- 2MiB segment / 64KiB page allocation
- O(1) ptr→page_meta lookup with segment masking
- C6-heavy A/B: SEGV-free but -44% performance (15.3M ops/s)
Phase v6-3: Thin-layer optimization (TLS ownership check + batch header + refill batching)
- TLS ownership fast-path skip page_meta for 90%+ of frees
- Batch header writes during refill (32 allocs = 1 header write)
- TLS batch refill (1/32 refill frequency)
- C6-heavy A/B: v6-2 15.3M → v6-3 27.1M ops/s (±0% vs baseline) ✅
Phase v6-4: Mixed hang fix (segment metadata lookup correction)
- Root cause: metadata lookup was reading mmap region instead of TLS slot
- Fix: use TLS slot descriptor with in_use validation
- Mixed health: 5M iterations SEGV-free, 35.8M ops/s ✅
Phase v6-refactor: Code quality improvements (macro unification + inline + docs)
- Add SMALL_V6_* prefix macros (header, pointer conversion, page index)
- Extract inline validation functions (small_page_v6_valid, small_ptr_in_segment_v6)
- Doxygen-style comments for all public functions
- Result: 0 compiler warnings, maintained +1.2% performance
Files:
- core/box/smallobject_core_v6_box.h (new, type & API definitions)
- core/box/smallobject_cold_iface_v6.h (new, cold iface API)
- core/box/smallsegment_v6_box.h (new, segment type definitions)
- core/smallobject_core_v6.c (new, C6 alloc/free implementation)
- core/smallobject_cold_iface_v6.c (new, refill/retire logic)
- core/smallsegment_v6.c (new, segment allocator)
- docs/analysis/SMALLOBJECT_CORE_V6_DESIGN.md (new, design document)
- core/box/tiny_route_env_box.h (modified, v6 route added)
- core/front/malloc_tiny_fast.h (modified, v6 case in route switch)
- Makefile (modified, v6 objects added)
- CURRENT_TASK.md (modified, v6 status added)
Status:
- C6-heavy: v6 OFF 27.1M → v6-3 ON 27.1M ops/s (±0%) ✅
- Mixed: v6 ON 35.8M ops/s (C6-only, other classes via v1) ✅
- Build: 0 warnings, fully documented ✅
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 15:29:59 +09:00
|
|
|
|
**実装上の注意(Box Theory)**:
|
|
|
|
|
|
- HotBox_v5 内で完結(ColdIface/SegmentBox には見せない)
|
|
|
|
|
|
- C6 専用(class_idx == C6 ガード必須)
|
|
|
|
|
|
- header full 前提(light との整合性は後続フェーズで)
|
|
|
|
|
|
- batch 内 block の page->used 扱い:
|
|
|
|
|
|
- Option A: used を触らない(batch は "hot reserved")→ 実装シンプル
|
|
|
|
|
|
- Option B: batch 格納時に used--、取り出し時に used++ → page 統計正確
|
|
|
|
|
|
|
|
|
|
|
|
**A/B 計画**:
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# C6-heavy (baseline: v5 full+cache ON = 37.02M)
|
|
|
|
|
|
HAKMEM_PROFILE=C6_HEAVY_LEGACY_POOLV1 \
|
|
|
|
|
|
HAKMEM_BENCH_MIN_SIZE=257 HAKMEM_BENCH_MAX_SIZE=768 \
|
|
|
|
|
|
HAKMEM_SMALL_HEAP_V5_ENABLED=1 \
|
|
|
|
|
|
HAKMEM_SMALL_HEAP_V5_CLASSES=0x40 \
|
|
|
|
|
|
HAKMEM_SMALL_HEAP_V5_HEADER_MODE=full \
|
|
|
|
|
|
HAKMEM_SMALL_HEAP_V5_TLS_CACHE_ENABLED=1 \
|
|
|
|
|
|
HAKMEM_SMALL_HEAP_V5_BATCH_ENABLED=0|1 \
|
|
|
|
|
|
./bench_random_mixed_hakmem 1000000 400 1
|
|
|
|
|
|
|
|
|
|
|
|
# 期待: batch ON で 37.02M → 38-39M ops/s (+3-5%)
|
|
|
|
|
|
```
|
2025-12-11 03:09:57 +09:00
|
|
|
|
|
Phase v6-1/2/3/4: SmallObject Core v6 - C6-only implementation + refactor
Phase v6-1: C6-only route stub (v1/pool fallback)
Phase v6-2: Segment v6 + ColdIface v6 + Core v6 HotPath implementation
- 2MiB segment / 64KiB page allocation
- O(1) ptr→page_meta lookup with segment masking
- C6-heavy A/B: SEGV-free but -44% performance (15.3M ops/s)
Phase v6-3: Thin-layer optimization (TLS ownership check + batch header + refill batching)
- TLS ownership fast-path skip page_meta for 90%+ of frees
- Batch header writes during refill (32 allocs = 1 header write)
- TLS batch refill (1/32 refill frequency)
- C6-heavy A/B: v6-2 15.3M → v6-3 27.1M ops/s (±0% vs baseline) ✅
Phase v6-4: Mixed hang fix (segment metadata lookup correction)
- Root cause: metadata lookup was reading mmap region instead of TLS slot
- Fix: use TLS slot descriptor with in_use validation
- Mixed health: 5M iterations SEGV-free, 35.8M ops/s ✅
Phase v6-refactor: Code quality improvements (macro unification + inline + docs)
- Add SMALL_V6_* prefix macros (header, pointer conversion, page index)
- Extract inline validation functions (small_page_v6_valid, small_ptr_in_segment_v6)
- Doxygen-style comments for all public functions
- Result: 0 compiler warnings, maintained +1.2% performance
Files:
- core/box/smallobject_core_v6_box.h (new, type & API definitions)
- core/box/smallobject_cold_iface_v6.h (new, cold iface API)
- core/box/smallsegment_v6_box.h (new, segment type definitions)
- core/smallobject_core_v6.c (new, C6 alloc/free implementation)
- core/smallobject_cold_iface_v6.c (new, refill/retire logic)
- core/smallsegment_v6.c (new, segment allocator)
- docs/analysis/SMALLOBJECT_CORE_V6_DESIGN.md (new, design document)
- core/box/tiny_route_env_box.h (modified, v6 route added)
- core/front/malloc_tiny_fast.h (modified, v6 case in route switch)
- Makefile (modified, v6 objects added)
- CURRENT_TASK.md (modified, v6 status added)
Status:
- C6-heavy: v6 OFF 27.1M → v6-3 ON 27.1M ops/s (±0%) ✅
- Mixed: v6 ON 35.8M ops/s (C6-only, other classes via v1) ✅
- Build: 0 warnings, fully documented ✅
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 15:29:59 +09:00
|
|
|
|
**目標性能**:
|
|
|
|
|
|
| Phase | C6-heavy ops/s | vs baseline |
|
|
|
|
|
|
|-------|---------------|-------------|
|
|
|
|
|
|
| v5 OFF (baseline) | 47.95M | - |
|
|
|
|
|
|
| v5-3 (O(1) lookup) | 38.5M | -20% |
|
|
|
|
|
|
| v5-4 (header light) | 39.25M | -18% |
|
|
|
|
|
|
| v5-5 (+ cache, full) | 37.02M | -23% |
|
|
|
|
|
|
| v5-6 (+ batch, full) | 37.78M | -21% |
|
2025-12-11 03:09:57 +09:00
|
|
|
|
|
Phase v6-1/2/3/4: SmallObject Core v6 - C6-only implementation + refactor
Phase v6-1: C6-only route stub (v1/pool fallback)
Phase v6-2: Segment v6 + ColdIface v6 + Core v6 HotPath implementation
- 2MiB segment / 64KiB page allocation
- O(1) ptr→page_meta lookup with segment masking
- C6-heavy A/B: SEGV-free but -44% performance (15.3M ops/s)
Phase v6-3: Thin-layer optimization (TLS ownership check + batch header + refill batching)
- TLS ownership fast-path skip page_meta for 90%+ of frees
- Batch header writes during refill (32 allocs = 1 header write)
- TLS batch refill (1/32 refill frequency)
- C6-heavy A/B: v6-2 15.3M → v6-3 27.1M ops/s (±0% vs baseline) ✅
Phase v6-4: Mixed hang fix (segment metadata lookup correction)
- Root cause: metadata lookup was reading mmap region instead of TLS slot
- Fix: use TLS slot descriptor with in_use validation
- Mixed health: 5M iterations SEGV-free, 35.8M ops/s ✅
Phase v6-refactor: Code quality improvements (macro unification + inline + docs)
- Add SMALL_V6_* prefix macros (header, pointer conversion, page index)
- Extract inline validation functions (small_page_v6_valid, small_ptr_in_segment_v6)
- Doxygen-style comments for all public functions
- Result: 0 compiler warnings, maintained +1.2% performance
Files:
- core/box/smallobject_core_v6_box.h (new, type & API definitions)
- core/box/smallobject_cold_iface_v6.h (new, cold iface API)
- core/box/smallsegment_v6_box.h (new, segment type definitions)
- core/smallobject_core_v6.c (new, C6 alloc/free implementation)
- core/smallobject_cold_iface_v6.c (new, refill/retire logic)
- core/smallsegment_v6.c (new, segment allocator)
- docs/analysis/SMALLOBJECT_CORE_V6_DESIGN.md (new, design document)
- core/box/tiny_route_env_box.h (modified, v6 route added)
- core/front/malloc_tiny_fast.h (modified, v6 case in route switch)
- Makefile (modified, v6 objects added)
- CURRENT_TASK.md (modified, v6 status added)
Status:
- C6-heavy: v6 OFF 27.1M → v6-3 ON 27.1M ops/s (±0%) ✅
- Mixed: v6 ON 35.8M ops/s (C6-only, other classes via v1) ✅
- Build: 0 warnings, fully documented ✅
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 15:29:59 +09:00
|
|
|
|
### Phase v5-7: C6 v5 ULTRA パターン適用(設計案)
|
|
|
|
|
|
|
|
|
|
|
|
現行 v5 は cache/batch/page_meta を積み上げた結果、C6-only でも v1/pool より 1 回あたりのコストが重く、-20% 前後の回帰が残っている。
|
|
|
|
|
|
C7 ULTRA(2MiB Segment + 64KiB Page + TLS freelist + mask free)が実測 +50% を出していることから、C6 v5 も ULTRA 型の HotPath に寄せる設計に切り替える。
|
2025-12-11 03:09:57 +09:00
|
|
|
|
|
Phase v6-1/2/3/4: SmallObject Core v6 - C6-only implementation + refactor
Phase v6-1: C6-only route stub (v1/pool fallback)
Phase v6-2: Segment v6 + ColdIface v6 + Core v6 HotPath implementation
- 2MiB segment / 64KiB page allocation
- O(1) ptr→page_meta lookup with segment masking
- C6-heavy A/B: SEGV-free but -44% performance (15.3M ops/s)
Phase v6-3: Thin-layer optimization (TLS ownership check + batch header + refill batching)
- TLS ownership fast-path skip page_meta for 90%+ of frees
- Batch header writes during refill (32 allocs = 1 header write)
- TLS batch refill (1/32 refill frequency)
- C6-heavy A/B: v6-2 15.3M → v6-3 27.1M ops/s (±0% vs baseline) ✅
Phase v6-4: Mixed hang fix (segment metadata lookup correction)
- Root cause: metadata lookup was reading mmap region instead of TLS slot
- Fix: use TLS slot descriptor with in_use validation
- Mixed health: 5M iterations SEGV-free, 35.8M ops/s ✅
Phase v6-refactor: Code quality improvements (macro unification + inline + docs)
- Add SMALL_V6_* prefix macros (header, pointer conversion, page index)
- Extract inline validation functions (small_page_v6_valid, small_ptr_in_segment_v6)
- Doxygen-style comments for all public functions
- Result: 0 compiler warnings, maintained +1.2% performance
Files:
- core/box/smallobject_core_v6_box.h (new, type & API definitions)
- core/box/smallobject_cold_iface_v6.h (new, cold iface API)
- core/box/smallsegment_v6_box.h (new, segment type definitions)
- core/smallobject_core_v6.c (new, C6 alloc/free implementation)
- core/smallobject_cold_iface_v6.c (new, refill/retire logic)
- core/smallsegment_v6.c (new, segment allocator)
- docs/analysis/SMALLOBJECT_CORE_V6_DESIGN.md (new, design document)
- core/box/tiny_route_env_box.h (modified, v6 route added)
- core/front/malloc_tiny_fast.h (modified, v6 case in route switch)
- Makefile (modified, v6 objects added)
- CURRENT_TASK.md (modified, v6 status added)
Status:
- C6-heavy: v6 OFF 27.1M → v6-3 ON 27.1M ops/s (±0%) ✅
- Mixed: v6 ON 35.8M ops/s (C6-only, other classes via v1) ✅
- Build: 0 warnings, fully documented ✅
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 15:29:59 +09:00
|
|
|
|
**目的**:
|
|
|
|
|
|
- C6-heavy 1M/400 で C6 v5 を「現行 v1/pool に近い or 超える」ライン(~45–50M ops/s)まで引き上げる(研究箱のまま進める)。
|
|
|
|
|
|
|
|
|
|
|
|
**HotBox_v5(C6用)の再設計(案)**:
|
|
|
|
|
|
- `SmallHeapCtxV5` に C6 専用の TLS freelist を持たせ、既存の cache/batch は slow/refill 側でのみ使う。
|
|
|
|
|
|
```c
|
|
|
|
|
|
typedef struct SmallHeapCtxV5 {
|
|
|
|
|
|
SmallClassHeapV5 cls[NUM_SMALL_CLASSES_V5];
|
|
|
|
|
|
uint8_t header_mode;
|
|
|
|
|
|
bool tls_cache_enabled;
|
|
|
|
|
|
SmallV5Batch c6_batch; // v5-6(slow側で再利用可)
|
|
|
|
|
|
|
|
|
|
|
|
// v5-7: C6 ULTRA 用 TLS freelist(32slot想定)
|
|
|
|
|
|
void* c6_tls_freelist[32]; // BASE pointer
|
|
|
|
|
|
uint8_t c6_tls_count;
|
|
|
|
|
|
} SmallHeapCtxV5;
|
|
|
|
|
|
```
|
2025-12-11 03:09:57 +09:00
|
|
|
|
|
Phase v6-1/2/3/4: SmallObject Core v6 - C6-only implementation + refactor
Phase v6-1: C6-only route stub (v1/pool fallback)
Phase v6-2: Segment v6 + ColdIface v6 + Core v6 HotPath implementation
- 2MiB segment / 64KiB page allocation
- O(1) ptr→page_meta lookup with segment masking
- C6-heavy A/B: SEGV-free but -44% performance (15.3M ops/s)
Phase v6-3: Thin-layer optimization (TLS ownership check + batch header + refill batching)
- TLS ownership fast-path skip page_meta for 90%+ of frees
- Batch header writes during refill (32 allocs = 1 header write)
- TLS batch refill (1/32 refill frequency)
- C6-heavy A/B: v6-2 15.3M → v6-3 27.1M ops/s (±0% vs baseline) ✅
Phase v6-4: Mixed hang fix (segment metadata lookup correction)
- Root cause: metadata lookup was reading mmap region instead of TLS slot
- Fix: use TLS slot descriptor with in_use validation
- Mixed health: 5M iterations SEGV-free, 35.8M ops/s ✅
Phase v6-refactor: Code quality improvements (macro unification + inline + docs)
- Add SMALL_V6_* prefix macros (header, pointer conversion, page index)
- Extract inline validation functions (small_page_v6_valid, small_ptr_in_segment_v6)
- Doxygen-style comments for all public functions
- Result: 0 compiler warnings, maintained +1.2% performance
Files:
- core/box/smallobject_core_v6_box.h (new, type & API definitions)
- core/box/smallobject_cold_iface_v6.h (new, cold iface API)
- core/box/smallsegment_v6_box.h (new, segment type definitions)
- core/smallobject_core_v6.c (new, C6 alloc/free implementation)
- core/smallobject_cold_iface_v6.c (new, refill/retire logic)
- core/smallsegment_v6.c (new, segment allocator)
- docs/analysis/SMALLOBJECT_CORE_V6_DESIGN.md (new, design document)
- core/box/tiny_route_env_box.h (modified, v6 route added)
- core/front/malloc_tiny_fast.h (modified, v6 case in route switch)
- Makefile (modified, v6 objects added)
- CURRENT_TASK.md (modified, v6 status added)
Status:
- C6-heavy: v6 OFF 27.1M → v6-3 ON 27.1M ops/s (±0%) ✅
- Mixed: v6 ON 35.8M ops/s (C6-only, other classes via v1) ✅
- Build: 0 warnings, fully documented ✅
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 15:29:59 +09:00
|
|
|
|
**C6 alloc/free HotPath(ULTRA パターン案)**:
|
|
|
|
|
|
- alloc(class_idx == C6 & ULTRA_C6 有効時):
|
|
|
|
|
|
```c
|
|
|
|
|
|
if (likely(ctx->c6_tls_count > 0)) {
|
|
|
|
|
|
return ctx->c6_tls_freelist[--ctx->c6_tls_count];
|
|
|
|
|
|
}
|
|
|
|
|
|
// 空 → refill(slow path)
|
|
|
|
|
|
return small_alloc_slow_v5_c6_refill(ctx);
|
|
|
|
|
|
```
|
|
|
|
|
|
- free:
|
|
|
|
|
|
```c
|
|
|
|
|
|
if (likely(ctx->c6_tls_count < 32)) {
|
|
|
|
|
|
ctx->c6_tls_freelist[ctx->c6_tls_count++] = base_ptr;
|
|
|
|
|
|
return;
|
|
|
|
|
|
}
|
|
|
|
|
|
small_free_slow_v5_c6_drain(base_ptr, ctx);
|
|
|
|
|
|
```
|
|
|
|
|
|
- HotPath は「TLS 配列 pop/push + 分岐 1 回」のみ。Segment/page_meta/header/cold_iface はすべて refill/drain 経由で扱う。
|
|
|
|
|
|
|
|
|
|
|
|
**Slow path(refill/drain)の役割(案)**:
|
|
|
|
|
|
- `small_alloc_slow_v5_c6_refill`:
|
|
|
|
|
|
- C6 用ページを Segment v5 から 1 枚取り、そこから複数ブロック(例: 32 個)を carve。
|
|
|
|
|
|
- header_mode==full/light に応じて carve 時にヘッダを書き込む。
|
|
|
|
|
|
- carve したブロックを `c6_tls_freelist[]` と既存 v5 の page freelist に分配。
|
|
|
|
|
|
- `small_free_slow_v5_c6_drain`:
|
|
|
|
|
|
- TLS が満杯のときに流れてきたブロックを既存 v5 の page freelist / retire ロジックに渡す。
|
|
|
|
|
|
|
|
|
|
|
|
**ENV ゲート案**:
|
|
|
|
|
|
- `HAKMEM_SMALL_HEAP_V5_ULTRA_C6_ENABLED=0|1`(デフォルト 0)。
|
|
|
|
|
|
- route は既存どおり `TINY_ROUTE_SMALL_HEAP_V5` を使い、`small_alloc_fast_v5` / `small_free_fast_v5` 内で:
|
|
|
|
|
|
- `if (!ULTRA_C6_ENABLED || class_idx != C6)` → 既存 v5 パス(cache/batch を含む)。
|
|
|
|
|
|
- `if (ULTRA_C6_ENABLED && class_idx == C6)` → 上記 TLS 32-slot ULTRA パス。
|
|
|
|
|
|
|
|
|
|
|
|
**header light との関係**:
|
|
|
|
|
|
- ULTRA パスでは header は refil/carve 時だけ書き、alloc/free では触らない前提にする(freelist ポインタと header の衝突を避ける)。
|
|
|
|
|
|
- まずは `header_mode=full` で ULTRA パスを実装し、その後 light との両立を段階的に検証する。
|
|
|
|
|
|
|
|
|
|
|
|
**A/B イメージ**:
|
|
|
|
|
|
- C6-heavy(1M/400, v5 ON, ULTRA_C6 ON/OFF):
|
|
|
|
|
|
- v5-6 (cache+batch) を基準に、ULTRA_C6 ON で +30〜50% 改善を期待(まずは SEGV/ハング無しを最優先)。
|
|
|
|
|
|
- Mixed 16–1024B:
|
|
|
|
|
|
- ULTRA_C6 ON 時の Mixed 全体への影響が ±数%〜10% 以内に収まるか確認(C6-heavy 専用オプション扱い)。
|