Phase v6-1/2/3/4: SmallObject Core v6 - C6-only implementation + refactor
Phase v6-1: C6-only route stub (v1/pool fallback)
Phase v6-2: Segment v6 + ColdIface v6 + Core v6 HotPath implementation
- 2MiB segment / 64KiB page allocation
- O(1) ptr→page_meta lookup with segment masking
- C6-heavy A/B: SEGV-free but -44% performance (15.3M ops/s)
Phase v6-3: Thin-layer optimization (TLS ownership check + batch header + refill batching)
- TLS ownership fast-path skip page_meta for 90%+ of frees
- Batch header writes during refill (32 allocs = 1 header write)
- TLS batch refill (1/32 refill frequency)
- C6-heavy A/B: v6-2 15.3M → v6-3 27.1M ops/s (±0% vs baseline) ✅
Phase v6-4: Mixed hang fix (segment metadata lookup correction)
- Root cause: metadata lookup was reading mmap region instead of TLS slot
- Fix: use TLS slot descriptor with in_use validation
- Mixed health: 5M iterations SEGV-free, 35.8M ops/s ✅
Phase v6-refactor: Code quality improvements (macro unification + inline + docs)
- Add SMALL_V6_* prefix macros (header, pointer conversion, page index)
- Extract inline validation functions (small_page_v6_valid, small_ptr_in_segment_v6)
- Doxygen-style comments for all public functions
- Result: 0 compiler warnings, maintained +1.2% performance
Files:
- core/box/smallobject_core_v6_box.h (new, type & API definitions)
- core/box/smallobject_cold_iface_v6.h (new, cold iface API)
- core/box/smallsegment_v6_box.h (new, segment type definitions)
- core/smallobject_core_v6.c (new, C6 alloc/free implementation)
- core/smallobject_cold_iface_v6.c (new, refill/retire logic)
- core/smallsegment_v6.c (new, segment allocator)
- docs/analysis/SMALLOBJECT_CORE_V6_DESIGN.md (new, design document)
- core/box/tiny_route_env_box.h (modified, v6 route added)
- core/front/malloc_tiny_fast.h (modified, v6 case in route switch)
- Makefile (modified, v6 objects added)
- CURRENT_TASK.md (modified, v6 status added)
Status:
- C6-heavy: v6 OFF 27.1M → v6-3 ON 27.1M ops/s (±0%) ✅
- Mixed: v6 ON 35.8M ops/s (C6-only, other classes via v1) ✅
- Build: 0 warnings, fully documented ✅
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 15:29:59 +09:00
|
|
|
|
// smallobject_cold_iface_v6.c - SmallObject ColdIface v6 実装(Phase v6-3)
|
|
|
|
|
|
|
|
|
|
|
|
#include <stdlib.h>
|
|
|
|
|
|
#include <string.h>
|
|
|
|
|
|
#include <stdint.h>
|
|
|
|
|
|
#include "box/smallobject_cold_iface_v6.h"
|
|
|
|
|
|
#include "box/smallsegment_v6_box.h"
|
|
|
|
|
|
|
|
|
|
|
|
#ifndef likely
|
|
|
|
|
|
#define likely(x) __builtin_expect(!!(x), 1)
|
|
|
|
|
|
#define unlikely(x) __builtin_expect(!!(x), 0)
|
|
|
|
|
|
#endif
|
|
|
|
|
|
|
2025-12-11 15:50:14 +09:00
|
|
|
|
// Refill page for given class (C6 and C5 in v6-5)
|
Phase v6-1/2/3/4: SmallObject Core v6 - C6-only implementation + refactor
Phase v6-1: C6-only route stub (v1/pool fallback)
Phase v6-2: Segment v6 + ColdIface v6 + Core v6 HotPath implementation
- 2MiB segment / 64KiB page allocation
- O(1) ptr→page_meta lookup with segment masking
- C6-heavy A/B: SEGV-free but -44% performance (15.3M ops/s)
Phase v6-3: Thin-layer optimization (TLS ownership check + batch header + refill batching)
- TLS ownership fast-path skip page_meta for 90%+ of frees
- Batch header writes during refill (32 allocs = 1 header write)
- TLS batch refill (1/32 refill frequency)
- C6-heavy A/B: v6-2 15.3M → v6-3 27.1M ops/s (±0% vs baseline) ✅
Phase v6-4: Mixed hang fix (segment metadata lookup correction)
- Root cause: metadata lookup was reading mmap region instead of TLS slot
- Fix: use TLS slot descriptor with in_use validation
- Mixed health: 5M iterations SEGV-free, 35.8M ops/s ✅
Phase v6-refactor: Code quality improvements (macro unification + inline + docs)
- Add SMALL_V6_* prefix macros (header, pointer conversion, page index)
- Extract inline validation functions (small_page_v6_valid, small_ptr_in_segment_v6)
- Doxygen-style comments for all public functions
- Result: 0 compiler warnings, maintained +1.2% performance
Files:
- core/box/smallobject_core_v6_box.h (new, type & API definitions)
- core/box/smallobject_cold_iface_v6.h (new, cold iface API)
- core/box/smallsegment_v6_box.h (new, segment type definitions)
- core/smallobject_core_v6.c (new, C6 alloc/free implementation)
- core/smallobject_cold_iface_v6.c (new, refill/retire logic)
- core/smallsegment_v6.c (new, segment allocator)
- docs/analysis/SMALLOBJECT_CORE_V6_DESIGN.md (new, design document)
- core/box/tiny_route_env_box.h (modified, v6 route added)
- core/front/malloc_tiny_fast.h (modified, v6 case in route switch)
- Makefile (modified, v6 objects added)
- CURRENT_TASK.md (modified, v6 status added)
Status:
- C6-heavy: v6 OFF 27.1M → v6-3 ON 27.1M ops/s (±0%) ✅
- Mixed: v6 ON 35.8M ops/s (C6-only, other classes via v1) ✅
- Build: 0 warnings, fully documented ✅
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 15:29:59 +09:00
|
|
|
|
SmallPageMetaV6* small_cold_v6_refill_page(uint32_t class_idx) {
|
2025-12-11 15:50:14 +09:00
|
|
|
|
// v6-5: Support C5 and C6
|
|
|
|
|
|
size_t block_size;
|
|
|
|
|
|
if (class_idx == SMALL_V6_C6_CLASS_IDX) {
|
|
|
|
|
|
block_size = SMALL_V6_C6_BLOCK_SIZE; // 512
|
|
|
|
|
|
} else if (class_idx == SMALL_V6_C5_CLASS_IDX) {
|
|
|
|
|
|
block_size = SMALL_V6_C5_BLOCK_SIZE; // 256
|
|
|
|
|
|
} else {
|
|
|
|
|
|
return NULL; // Unsupported class
|
Phase v6-1/2/3/4: SmallObject Core v6 - C6-only implementation + refactor
Phase v6-1: C6-only route stub (v1/pool fallback)
Phase v6-2: Segment v6 + ColdIface v6 + Core v6 HotPath implementation
- 2MiB segment / 64KiB page allocation
- O(1) ptr→page_meta lookup with segment masking
- C6-heavy A/B: SEGV-free but -44% performance (15.3M ops/s)
Phase v6-3: Thin-layer optimization (TLS ownership check + batch header + refill batching)
- TLS ownership fast-path skip page_meta for 90%+ of frees
- Batch header writes during refill (32 allocs = 1 header write)
- TLS batch refill (1/32 refill frequency)
- C6-heavy A/B: v6-2 15.3M → v6-3 27.1M ops/s (±0% vs baseline) ✅
Phase v6-4: Mixed hang fix (segment metadata lookup correction)
- Root cause: metadata lookup was reading mmap region instead of TLS slot
- Fix: use TLS slot descriptor with in_use validation
- Mixed health: 5M iterations SEGV-free, 35.8M ops/s ✅
Phase v6-refactor: Code quality improvements (macro unification + inline + docs)
- Add SMALL_V6_* prefix macros (header, pointer conversion, page index)
- Extract inline validation functions (small_page_v6_valid, small_ptr_in_segment_v6)
- Doxygen-style comments for all public functions
- Result: 0 compiler warnings, maintained +1.2% performance
Files:
- core/box/smallobject_core_v6_box.h (new, type & API definitions)
- core/box/smallobject_cold_iface_v6.h (new, cold iface API)
- core/box/smallsegment_v6_box.h (new, segment type definitions)
- core/smallobject_core_v6.c (new, C6 alloc/free implementation)
- core/smallobject_cold_iface_v6.c (new, refill/retire logic)
- core/smallsegment_v6.c (new, segment allocator)
- docs/analysis/SMALLOBJECT_CORE_V6_DESIGN.md (new, design document)
- core/box/tiny_route_env_box.h (modified, v6 route added)
- core/front/malloc_tiny_fast.h (modified, v6 case in route switch)
- Makefile (modified, v6 objects added)
- CURRENT_TASK.md (modified, v6 status added)
Status:
- C6-heavy: v6 OFF 27.1M → v6-3 ON 27.1M ops/s (±0%) ✅
- Mixed: v6 ON 35.8M ops/s (C6-only, other classes via v1) ✅
- Build: 0 warnings, fully documented ✅
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 15:29:59 +09:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
// Get or acquire TLS segment
|
|
|
|
|
|
SmallSegmentV6* seg = small_segment_v6_acquire_for_thread();
|
|
|
|
|
|
if (unlikely(!seg)) {
|
|
|
|
|
|
return NULL;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
// Find an available page (simple linear scan)
|
|
|
|
|
|
SmallPageMetaV6* page = NULL;
|
|
|
|
|
|
for (uint32_t i = 0; i < seg->num_pages; i++) {
|
|
|
|
|
|
if (seg->page_meta[i].capacity == 0) {
|
|
|
|
|
|
page = &seg->page_meta[i];
|
|
|
|
|
|
break;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
if (unlikely(!page)) {
|
|
|
|
|
|
return NULL; // All pages in use
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2025-12-11 15:50:14 +09:00
|
|
|
|
// Initialize page metadata
|
Phase v6-1/2/3/4: SmallObject Core v6 - C6-only implementation + refactor
Phase v6-1: C6-only route stub (v1/pool fallback)
Phase v6-2: Segment v6 + ColdIface v6 + Core v6 HotPath implementation
- 2MiB segment / 64KiB page allocation
- O(1) ptr→page_meta lookup with segment masking
- C6-heavy A/B: SEGV-free but -44% performance (15.3M ops/s)
Phase v6-3: Thin-layer optimization (TLS ownership check + batch header + refill batching)
- TLS ownership fast-path skip page_meta for 90%+ of frees
- Batch header writes during refill (32 allocs = 1 header write)
- TLS batch refill (1/32 refill frequency)
- C6-heavy A/B: v6-2 15.3M → v6-3 27.1M ops/s (±0% vs baseline) ✅
Phase v6-4: Mixed hang fix (segment metadata lookup correction)
- Root cause: metadata lookup was reading mmap region instead of TLS slot
- Fix: use TLS slot descriptor with in_use validation
- Mixed health: 5M iterations SEGV-free, 35.8M ops/s ✅
Phase v6-refactor: Code quality improvements (macro unification + inline + docs)
- Add SMALL_V6_* prefix macros (header, pointer conversion, page index)
- Extract inline validation functions (small_page_v6_valid, small_ptr_in_segment_v6)
- Doxygen-style comments for all public functions
- Result: 0 compiler warnings, maintained +1.2% performance
Files:
- core/box/smallobject_core_v6_box.h (new, type & API definitions)
- core/box/smallobject_cold_iface_v6.h (new, cold iface API)
- core/box/smallsegment_v6_box.h (new, segment type definitions)
- core/smallobject_core_v6.c (new, C6 alloc/free implementation)
- core/smallobject_cold_iface_v6.c (new, refill/retire logic)
- core/smallsegment_v6.c (new, segment allocator)
- docs/analysis/SMALLOBJECT_CORE_V6_DESIGN.md (new, design document)
- core/box/tiny_route_env_box.h (modified, v6 route added)
- core/front/malloc_tiny_fast.h (modified, v6 case in route switch)
- Makefile (modified, v6 objects added)
- CURRENT_TASK.md (modified, v6 status added)
Status:
- C6-heavy: v6 OFF 27.1M → v6-3 ON 27.1M ops/s (±0%) ✅
- Mixed: v6 ON 35.8M ops/s (C6-only, other classes via v1) ✅
- Build: 0 warnings, fully documented ✅
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 15:29:59 +09:00
|
|
|
|
page->class_idx = (uint8_t)class_idx;
|
2025-12-11 15:50:14 +09:00
|
|
|
|
page->capacity = SMALL_PAGE_V6_SIZE / block_size; // C6: 128, C5: 256
|
Phase v6-1/2/3/4: SmallObject Core v6 - C6-only implementation + refactor
Phase v6-1: C6-only route stub (v1/pool fallback)
Phase v6-2: Segment v6 + ColdIface v6 + Core v6 HotPath implementation
- 2MiB segment / 64KiB page allocation
- O(1) ptr→page_meta lookup with segment masking
- C6-heavy A/B: SEGV-free but -44% performance (15.3M ops/s)
Phase v6-3: Thin-layer optimization (TLS ownership check + batch header + refill batching)
- TLS ownership fast-path skip page_meta for 90%+ of frees
- Batch header writes during refill (32 allocs = 1 header write)
- TLS batch refill (1/32 refill frequency)
- C6-heavy A/B: v6-2 15.3M → v6-3 27.1M ops/s (±0% vs baseline) ✅
Phase v6-4: Mixed hang fix (segment metadata lookup correction)
- Root cause: metadata lookup was reading mmap region instead of TLS slot
- Fix: use TLS slot descriptor with in_use validation
- Mixed health: 5M iterations SEGV-free, 35.8M ops/s ✅
Phase v6-refactor: Code quality improvements (macro unification + inline + docs)
- Add SMALL_V6_* prefix macros (header, pointer conversion, page index)
- Extract inline validation functions (small_page_v6_valid, small_ptr_in_segment_v6)
- Doxygen-style comments for all public functions
- Result: 0 compiler warnings, maintained +1.2% performance
Files:
- core/box/smallobject_core_v6_box.h (new, type & API definitions)
- core/box/smallobject_cold_iface_v6.h (new, cold iface API)
- core/box/smallsegment_v6_box.h (new, segment type definitions)
- core/smallobject_core_v6.c (new, C6 alloc/free implementation)
- core/smallobject_cold_iface_v6.c (new, refill/retire logic)
- core/smallsegment_v6.c (new, segment allocator)
- docs/analysis/SMALLOBJECT_CORE_V6_DESIGN.md (new, design document)
- core/box/tiny_route_env_box.h (modified, v6 route added)
- core/front/malloc_tiny_fast.h (modified, v6 case in route switch)
- Makefile (modified, v6 objects added)
- CURRENT_TASK.md (modified, v6 status added)
Status:
- C6-heavy: v6 OFF 27.1M → v6-3 ON 27.1M ops/s (±0%) ✅
- Mixed: v6 ON 35.8M ops/s (C6-only, other classes via v1) ✅
- Build: 0 warnings, fully documented ✅
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 15:29:59 +09:00
|
|
|
|
page->used = 0;
|
|
|
|
|
|
page->flags = 0;
|
|
|
|
|
|
|
|
|
|
|
|
// Build freelist for the page
|
|
|
|
|
|
uintptr_t page_offset = (uintptr_t)page->page_idx * SMALL_PAGE_V6_SIZE;
|
|
|
|
|
|
uintptr_t page_base = seg->base + page_offset;
|
|
|
|
|
|
uint8_t* base = (uint8_t*)page_base;
|
|
|
|
|
|
|
|
|
|
|
|
// Build intrusive freelist (last to first for cache locality)
|
|
|
|
|
|
void* freelist = NULL;
|
|
|
|
|
|
for (int i = (int)page->capacity - 1; i >= 0; i--) {
|
2025-12-11 15:50:14 +09:00
|
|
|
|
uint8_t* block = base + ((size_t)i * block_size);
|
Phase v6-1/2/3/4: SmallObject Core v6 - C6-only implementation + refactor
Phase v6-1: C6-only route stub (v1/pool fallback)
Phase v6-2: Segment v6 + ColdIface v6 + Core v6 HotPath implementation
- 2MiB segment / 64KiB page allocation
- O(1) ptr→page_meta lookup with segment masking
- C6-heavy A/B: SEGV-free but -44% performance (15.3M ops/s)
Phase v6-3: Thin-layer optimization (TLS ownership check + batch header + refill batching)
- TLS ownership fast-path skip page_meta for 90%+ of frees
- Batch header writes during refill (32 allocs = 1 header write)
- TLS batch refill (1/32 refill frequency)
- C6-heavy A/B: v6-2 15.3M → v6-3 27.1M ops/s (±0% vs baseline) ✅
Phase v6-4: Mixed hang fix (segment metadata lookup correction)
- Root cause: metadata lookup was reading mmap region instead of TLS slot
- Fix: use TLS slot descriptor with in_use validation
- Mixed health: 5M iterations SEGV-free, 35.8M ops/s ✅
Phase v6-refactor: Code quality improvements (macro unification + inline + docs)
- Add SMALL_V6_* prefix macros (header, pointer conversion, page index)
- Extract inline validation functions (small_page_v6_valid, small_ptr_in_segment_v6)
- Doxygen-style comments for all public functions
- Result: 0 compiler warnings, maintained +1.2% performance
Files:
- core/box/smallobject_core_v6_box.h (new, type & API definitions)
- core/box/smallobject_cold_iface_v6.h (new, cold iface API)
- core/box/smallsegment_v6_box.h (new, segment type definitions)
- core/smallobject_core_v6.c (new, C6 alloc/free implementation)
- core/smallobject_cold_iface_v6.c (new, refill/retire logic)
- core/smallsegment_v6.c (new, segment allocator)
- docs/analysis/SMALLOBJECT_CORE_V6_DESIGN.md (new, design document)
- core/box/tiny_route_env_box.h (modified, v6 route added)
- core/front/malloc_tiny_fast.h (modified, v6 case in route switch)
- Makefile (modified, v6 objects added)
- CURRENT_TASK.md (modified, v6 status added)
Status:
- C6-heavy: v6 OFF 27.1M → v6-3 ON 27.1M ops/s (±0%) ✅
- Mixed: v6 ON 35.8M ops/s (C6-only, other classes via v1) ✅
- Build: 0 warnings, fully documented ✅
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-11 15:29:59 +09:00
|
|
|
|
|
|
|
|
|
|
// Build freelist using BASE pointers
|
|
|
|
|
|
void* next = freelist;
|
|
|
|
|
|
memcpy(block, &next, sizeof(void*));
|
|
|
|
|
|
freelist = block;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
page->free_list = freelist;
|
|
|
|
|
|
|
|
|
|
|
|
return page;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
// Retire page (simple reset for v6-2)
|
|
|
|
|
|
void small_cold_v6_retire_page(SmallPageMetaV6* page) {
|
|
|
|
|
|
if (unlikely(!page)) {
|
|
|
|
|
|
return;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
// v6-2: Simple reset (no actual deallocation)
|
|
|
|
|
|
page->free_list = NULL;
|
|
|
|
|
|
page->used = 0;
|
|
|
|
|
|
page->capacity = 0;
|
|
|
|
|
|
page->class_idx = 0;
|
|
|
|
|
|
page->flags = 0;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
// Remote operations (dummy for v6-2, C6-heavy is mostly same-thread)
|
|
|
|
|
|
void small_cold_v6_remote_push(SmallPageMetaV6* page, void* ptr, uint32_t tid) {
|
|
|
|
|
|
(void)page;
|
|
|
|
|
|
(void)ptr;
|
|
|
|
|
|
(void)tid;
|
|
|
|
|
|
// Not implemented in v6-2
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
void small_cold_v6_remote_drain(SmallHeapCtxV6* ctx) {
|
|
|
|
|
|
(void)ctx;
|
|
|
|
|
|
// Not implemented in v6-2
|
|
|
|
|
|
}
|