Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
// tiny_free_magazine.inc.h - Magazine Layer for hak_tiny_free_with_slab()
|
|
|
|
|
|
// Purpose: TLS caching (TinyQuickSlot, TLS SLL, Magazine) and spill logic
|
|
|
|
|
|
// Extracted from: hakmem_tiny_free.inc lines 208-620
|
|
|
|
|
|
// Box Theory: Box 5 (TLS Cache) integration
|
|
|
|
|
|
//
|
|
|
|
|
|
// Context: This file is #included within hak_tiny_free_with_slab() function body
|
|
|
|
|
|
// Prerequisites: ss, meta, class_idx, ptr variables must be defined in calling scope
|
|
|
|
|
|
|
|
|
|
|
|
#if !HAKMEM_BUILD_RELEASE
|
2025-12-05 20:43:14 +09:00
|
|
|
|
if (!slab) {
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
// SuperSlab uses Magazine for TLS caching (same as TinySlab)
|
|
|
|
|
|
tiny_small_mags_init_once();
|
|
|
|
|
|
if (class_idx > 3) tiny_mag_init_if_needed(class_idx);
|
|
|
|
|
|
TinyTLSMag* mag = &g_tls_mags[class_idx];
|
|
|
|
|
|
int cap = mag->cap;
|
|
|
|
|
|
|
|
|
|
|
|
// 32/64B: SLL優先(mag優先は無効化)
|
|
|
|
|
|
// Prefer TinyQuickSlot (compile-out if HAKMEM_TINY_NO_QUICK)
|
|
|
|
|
|
#if !defined(HAKMEM_TINY_NO_QUICK)
|
|
|
|
|
|
if (g_quick_enable && class_idx <= 4) {
|
|
|
|
|
|
TinyQuickSlot* qs = &g_tls_quick[class_idx];
|
|
|
|
|
|
if (__builtin_expect(qs->top < QUICK_CAP, 1)) {
|
Phase E1-CORRECT: Fix USER/BASE pointer conversion bugs in slab_index_for calls
CRITICAL BUG FIX: Phase E1 introduced 1-byte headers for ALL size classes (C0-C7),
changing the pointer contract. However, many locations still called slab_index_for()
with USER pointers (storage+1) instead of BASE pointers (storage), causing off-by-one
slab index calculations that corrupted memory.
Root Cause:
- USER pointer = BASE + 1 (returned by malloc, points past header)
- BASE pointer = storage start (where 1-byte header is written)
- slab_index_for() expects BASE pointer for correct slab boundary calculations
- Passing USER pointer → wrong slab_idx → wrong metadata → freelist corruption
Impact Before Fix:
- bench_random_mixed crashes at ~14K iterations with SEGV
- Massive C7 alignment check failures (wrong slab classification)
- Memory corruption from writing to wrong slab freelists
Fixes Applied (8 locations):
1. core/hakmem_tiny_free.inc:137
- Added USER→BASE conversion before slab_index_for()
2. core/hakmem_tiny_ultra_simple.inc:148
- Added USER→BASE conversion before slab_index_for()
3. core/tiny_free_fast.inc.h:220
- Added USER→BASE conversion before slab_index_for()
4-5. core/tiny_free_magazine.inc.h:126,315
- Added USER→BASE conversion before slab_index_for() (2 locations)
6. core/box/free_local_box.c:14,22,62
- Added USER→BASE conversion before slab_index_for()
- Fixed delta calculation to use BASE instead of USER
- Fixed debug logging to use BASE instead of USER
7. core/hakmem_tiny.c:448,460,473 (tiny_debug_track_alloc_ret)
- Added USER→BASE conversion before slab_index_for() (2 calls)
- Fixed delta calculation to use BASE instead of USER
- This function is called on EVERY allocation in debug builds
Results After Fix:
✅ bench_random_mixed stable up to 66K iterations (~4.7x improvement)
✅ C7 alignment check failures eliminated (was: 100% failure rate)
✅ Front Gate "Unknown" classification dropped to 0% (was: 1.67%)
✅ No segfaults for workloads up to ~33K allocations
Remaining Issue:
❌ Segfault still occurs at iteration 66152 (allocs=33137, frees=33014)
- Different bug from USER/BASE conversion issues
- Likely capacity/boundary condition (further investigation needed)
Testing:
- bench_random_mixed_hakmem 1K-66K iterations: PASS
- bench_random_mixed_hakmem 67K+ iterations: FAIL (different bug)
- bench_fixed_size_hakmem 200K iterations: PASS
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 05:21:36 +09:00
|
|
|
|
// Phase E1-CORRECT: ALL classes (C0-C7) have 1-byte header
|
2025-12-01 16:37:59 +09:00
|
|
|
|
// Phase 10: Use hak_base_ptr_t
|
|
|
|
|
|
hak_base_ptr_t base_ptr = hak_user_to_base(HAK_USER_FROM_RAW(ptr));
|
|
|
|
|
|
qs->items[qs->top++] = HAK_BASE_TO_RAW(base_ptr);
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
HAK_STAT_FREE(class_idx);
|
|
|
|
|
|
return;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
#endif
|
|
|
|
|
|
|
|
|
|
|
|
// Fast path: TLS SLL push for hottest classes
|
2025-11-20 07:32:30 +09:00
|
|
|
|
if (!g_tls_list_enable && g_tls_sll_enable && g_tls_sll[class_idx].count < sll_cap_for_class(class_idx, (uint32_t)cap)) {
|
2025-12-01 16:37:59 +09:00
|
|
|
|
// Phase 10: Use hak_base_ptr_t
|
|
|
|
|
|
hak_base_ptr_t base_ptr = hak_user_to_base(HAK_USER_FROM_RAW(ptr));
|
2025-11-10 16:48:20 +09:00
|
|
|
|
uint32_t sll_cap = sll_cap_for_class(class_idx, (uint32_t)cap);
|
2025-12-01 16:37:59 +09:00
|
|
|
|
if (tls_sll_push(class_idx, base_ptr, sll_cap)) {
|
2025-11-10 16:48:20 +09:00
|
|
|
|
// BUGFIX: Decrement used counter (was missing, causing Fail-Fast on next free)
|
|
|
|
|
|
meta->used--;
|
|
|
|
|
|
// Active → Inactive: count down immediately (TLS保管中は"使用中"ではない)
|
|
|
|
|
|
ss_active_dec_one(ss);
|
|
|
|
|
|
HAK_TP1(sll_push, class_idx);
|
|
|
|
|
|
tiny_debug_ring_record(TINY_RING_EVENT_FREE_LOCAL, (uint16_t)class_idx, ptr, 3);
|
|
|
|
|
|
HAK_STAT_FREE(class_idx);
|
|
|
|
|
|
return;
|
|
|
|
|
|
}
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
// Next: Magazine push(必要ならmag→SLLへバルク転送で空きを作る)
|
|
|
|
|
|
// Hysteresis: allow slight overfill before deciding to spill under lock
|
|
|
|
|
|
if (mag->top >= cap && g_spill_hyst > 0) {
|
|
|
|
|
|
(void)bulk_mag_to_sll_if_room(class_idx, mag, cap / 2);
|
|
|
|
|
|
}
|
|
|
|
|
|
if (mag->top < cap + g_spill_hyst) {
|
2025-12-01 16:37:59 +09:00
|
|
|
|
// Phase 10: Use hak_base_ptr_t
|
|
|
|
|
|
hak_base_ptr_t base_ptr = hak_user_to_base(HAK_USER_FROM_RAW(ptr));
|
|
|
|
|
|
mag->items[mag->top].ptr = HAK_BASE_TO_RAW(base_ptr);
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
#if HAKMEM_TINY_MAG_OWNER
|
|
|
|
|
|
mag->items[mag->top].owner = NULL; // SuperSlab owner not a TinySlab; leave NULL
|
|
|
|
|
|
#endif
|
|
|
|
|
|
mag->top++;
|
|
|
|
|
|
#if HAKMEM_DEBUG_COUNTERS
|
|
|
|
|
|
g_magazine_push_count++; // Phase 7.6: Track pushes
|
|
|
|
|
|
#endif
|
|
|
|
|
|
// Active → Inactive: decrement now(アプリ解放時に非アクティブ扱い)
|
|
|
|
|
|
ss_active_dec_one(ss);
|
|
|
|
|
|
HAK_TP1(mag_push, class_idx);
|
|
|
|
|
|
tiny_debug_ring_record(TINY_RING_EVENT_FREE_RETURN_MAG, (uint16_t)class_idx, ptr, 2);
|
|
|
|
|
|
HAK_STAT_FREE(class_idx);
|
|
|
|
|
|
return;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
// Background spill: queue to BG thread instead of locking (when enabled)
|
|
|
|
|
|
if (g_bg_spill_enable) {
|
|
|
|
|
|
uint32_t qlen = atomic_load_explicit(&g_bg_spill_len[class_idx], memory_order_relaxed);
|
|
|
|
|
|
if ((int)qlen < g_bg_spill_target) {
|
|
|
|
|
|
// Build a small chain: include current ptr and pop from mag up to limit
|
|
|
|
|
|
int limit = g_bg_spill_max_batch;
|
|
|
|
|
|
if (limit > cap/2) limit = cap/2;
|
|
|
|
|
|
if (limit > 32) limit = 32; // keep free-path bounded
|
2025-12-01 16:37:59 +09:00
|
|
|
|
// Phase 10: Use hak_base_ptr_t
|
|
|
|
|
|
void* head = HAK_BASE_TO_RAW(hak_user_to_base(HAK_USER_FROM_RAW(ptr)));
|
2025-11-10 18:04:08 +09:00
|
|
|
|
#if HAKMEM_TINY_HEADER_CLASSIDX
|
Phase E1-CORRECT: Fix USER/BASE pointer conversion bugs in slab_index_for calls
CRITICAL BUG FIX: Phase E1 introduced 1-byte headers for ALL size classes (C0-C7),
changing the pointer contract. However, many locations still called slab_index_for()
with USER pointers (storage+1) instead of BASE pointers (storage), causing off-by-one
slab index calculations that corrupted memory.
Root Cause:
- USER pointer = BASE + 1 (returned by malloc, points past header)
- BASE pointer = storage start (where 1-byte header is written)
- slab_index_for() expects BASE pointer for correct slab boundary calculations
- Passing USER pointer → wrong slab_idx → wrong metadata → freelist corruption
Impact Before Fix:
- bench_random_mixed crashes at ~14K iterations with SEGV
- Massive C7 alignment check failures (wrong slab classification)
- Memory corruption from writing to wrong slab freelists
Fixes Applied (8 locations):
1. core/hakmem_tiny_free.inc:137
- Added USER→BASE conversion before slab_index_for()
2. core/hakmem_tiny_ultra_simple.inc:148
- Added USER→BASE conversion before slab_index_for()
3. core/tiny_free_fast.inc.h:220
- Added USER→BASE conversion before slab_index_for()
4-5. core/tiny_free_magazine.inc.h:126,315
- Added USER→BASE conversion before slab_index_for() (2 locations)
6. core/box/free_local_box.c:14,22,62
- Added USER→BASE conversion before slab_index_for()
- Fixed delta calculation to use BASE instead of USER
- Fixed debug logging to use BASE instead of USER
7. core/hakmem_tiny.c:448,460,473 (tiny_debug_track_alloc_ret)
- Added USER→BASE conversion before slab_index_for() (2 calls)
- Fixed delta calculation to use BASE instead of USER
- This function is called on EVERY allocation in debug builds
Results After Fix:
✅ bench_random_mixed stable up to 66K iterations (~4.7x improvement)
✅ C7 alignment check failures eliminated (was: 100% failure rate)
✅ Front Gate "Unknown" classification dropped to 0% (was: 1.67%)
✅ No segfaults for workloads up to ~33K allocations
Remaining Issue:
❌ Segfault still occurs at iteration 66152 (allocs=33137, frees=33014)
- Different bug from USER/BASE conversion issues
- Likely capacity/boundary condition (further investigation needed)
Testing:
- bench_random_mixed_hakmem 1K-66K iterations: PASS
- bench_random_mixed_hakmem 67K+ iterations: FAIL (different bug)
- bench_fixed_size_hakmem 200K iterations: PASS
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 05:21:36 +09:00
|
|
|
|
const size_t next_off = 1; // Phase E1-CORRECT: Always 1
|
2025-11-10 18:04:08 +09:00
|
|
|
|
#else
|
|
|
|
|
|
const size_t next_off = 0;
|
|
|
|
|
|
#endif
|
2025-11-14 01:02:00 +09:00
|
|
|
|
// Build single-linked list via Box next-ptr API (per-class)
|
|
|
|
|
|
tiny_next_write(class_idx, head, NULL);
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
void* tail = head; // current tail
|
|
|
|
|
|
int taken = 1;
|
|
|
|
|
|
while (taken < limit && mag->top > 0) {
|
|
|
|
|
|
void* p2 = mag->items[--mag->top].ptr;
|
2025-11-10 18:04:08 +09:00
|
|
|
|
#if HAKMEM_TINY_HEADER_CLASSIDX
|
Phase E1-CORRECT: Fix USER/BASE pointer conversion bugs in slab_index_for calls
CRITICAL BUG FIX: Phase E1 introduced 1-byte headers for ALL size classes (C0-C7),
changing the pointer contract. However, many locations still called slab_index_for()
with USER pointers (storage+1) instead of BASE pointers (storage), causing off-by-one
slab index calculations that corrupted memory.
Root Cause:
- USER pointer = BASE + 1 (returned by malloc, points past header)
- BASE pointer = storage start (where 1-byte header is written)
- slab_index_for() expects BASE pointer for correct slab boundary calculations
- Passing USER pointer → wrong slab_idx → wrong metadata → freelist corruption
Impact Before Fix:
- bench_random_mixed crashes at ~14K iterations with SEGV
- Massive C7 alignment check failures (wrong slab classification)
- Memory corruption from writing to wrong slab freelists
Fixes Applied (8 locations):
1. core/hakmem_tiny_free.inc:137
- Added USER→BASE conversion before slab_index_for()
2. core/hakmem_tiny_ultra_simple.inc:148
- Added USER→BASE conversion before slab_index_for()
3. core/tiny_free_fast.inc.h:220
- Added USER→BASE conversion before slab_index_for()
4-5. core/tiny_free_magazine.inc.h:126,315
- Added USER→BASE conversion before slab_index_for() (2 locations)
6. core/box/free_local_box.c:14,22,62
- Added USER→BASE conversion before slab_index_for()
- Fixed delta calculation to use BASE instead of USER
- Fixed debug logging to use BASE instead of USER
7. core/hakmem_tiny.c:448,460,473 (tiny_debug_track_alloc_ret)
- Added USER→BASE conversion before slab_index_for() (2 calls)
- Fixed delta calculation to use BASE instead of USER
- This function is called on EVERY allocation in debug builds
Results After Fix:
✅ bench_random_mixed stable up to 66K iterations (~4.7x improvement)
✅ C7 alignment check failures eliminated (was: 100% failure rate)
✅ Front Gate "Unknown" classification dropped to 0% (was: 1.67%)
✅ No segfaults for workloads up to ~33K allocations
Remaining Issue:
❌ Segfault still occurs at iteration 66152 (allocs=33137, frees=33014)
- Different bug from USER/BASE conversion issues
- Likely capacity/boundary condition (further investigation needed)
Testing:
- bench_random_mixed_hakmem 1K-66K iterations: PASS
- bench_random_mixed_hakmem 67K+ iterations: FAIL (different bug)
- bench_fixed_size_hakmem 200K iterations: PASS
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 05:21:36 +09:00
|
|
|
|
const size_t next_off2 = 1; // Phase E1-CORRECT: Always 1
|
2025-11-10 18:04:08 +09:00
|
|
|
|
#else
|
|
|
|
|
|
const size_t next_off2 = 0;
|
|
|
|
|
|
#endif
|
2025-11-14 01:02:00 +09:00
|
|
|
|
tiny_next_write(class_idx, p2, head);
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
head = p2;
|
|
|
|
|
|
taken++;
|
|
|
|
|
|
}
|
|
|
|
|
|
// Push chain to spill queue (single CAS)
|
|
|
|
|
|
bg_spill_push_chain(class_idx, head, tail, taken);
|
|
|
|
|
|
tiny_debug_ring_record(TINY_RING_EVENT_FREE_RETURN_MAG, (uint16_t)class_idx, ptr, 3);
|
|
|
|
|
|
HAK_STAT_FREE(class_idx);
|
|
|
|
|
|
return;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
// Spill half (SuperSlab version - simpler than TinySlab)
|
2025-12-01 16:37:59 +09:00
|
|
|
|
pthread_mutex_t* lock = &g_tiny_class_locks[class_idx].m;
|
|
|
|
|
|
// Profiling fix for debug build
|
|
|
|
|
|
struct timespec tss;
|
|
|
|
|
|
int ss_time = hkm_prof_begin(&tss);
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
pthread_mutex_lock(lock);
|
|
|
|
|
|
// Batch spill: reduce lock frequency and work per call
|
|
|
|
|
|
int spill = cap / 2;
|
|
|
|
|
|
int over = mag->top - (cap + g_spill_hyst);
|
|
|
|
|
|
if (over > 0 && over < spill) spill = over;
|
|
|
|
|
|
|
|
|
|
|
|
for (int i = 0; i < spill && mag->top > 0; i++) {
|
|
|
|
|
|
TinyMagItem it = mag->items[--mag->top];
|
|
|
|
|
|
|
|
|
|
|
|
// Phase 7.6: SuperSlab spill - return to freelist
|
|
|
|
|
|
SuperSlab* owner_ss = hak_super_lookup(it.ptr);
|
|
|
|
|
|
if (owner_ss && owner_ss->magic == SUPERSLAB_MAGIC) {
|
|
|
|
|
|
// Direct freelist push (same as old hak_tiny_free_superslab)
|
2025-12-01 16:37:59 +09:00
|
|
|
|
// Phase 10: it.ptr is BASE.
|
2025-12-03 12:29:31 +09:00
|
|
|
|
// FIX: it.ptr is BASE, use it directly (do not subtract 1)
|
2025-12-01 16:37:59 +09:00
|
|
|
|
void* base = it.ptr;
|
Phase E1-CORRECT: Fix USER/BASE pointer conversion bugs in slab_index_for calls
CRITICAL BUG FIX: Phase E1 introduced 1-byte headers for ALL size classes (C0-C7),
changing the pointer contract. However, many locations still called slab_index_for()
with USER pointers (storage+1) instead of BASE pointers (storage), causing off-by-one
slab index calculations that corrupted memory.
Root Cause:
- USER pointer = BASE + 1 (returned by malloc, points past header)
- BASE pointer = storage start (where 1-byte header is written)
- slab_index_for() expects BASE pointer for correct slab boundary calculations
- Passing USER pointer → wrong slab_idx → wrong metadata → freelist corruption
Impact Before Fix:
- bench_random_mixed crashes at ~14K iterations with SEGV
- Massive C7 alignment check failures (wrong slab classification)
- Memory corruption from writing to wrong slab freelists
Fixes Applied (8 locations):
1. core/hakmem_tiny_free.inc:137
- Added USER→BASE conversion before slab_index_for()
2. core/hakmem_tiny_ultra_simple.inc:148
- Added USER→BASE conversion before slab_index_for()
3. core/tiny_free_fast.inc.h:220
- Added USER→BASE conversion before slab_index_for()
4-5. core/tiny_free_magazine.inc.h:126,315
- Added USER→BASE conversion before slab_index_for() (2 locations)
6. core/box/free_local_box.c:14,22,62
- Added USER→BASE conversion before slab_index_for()
- Fixed delta calculation to use BASE instead of USER
- Fixed debug logging to use BASE instead of USER
7. core/hakmem_tiny.c:448,460,473 (tiny_debug_track_alloc_ret)
- Added USER→BASE conversion before slab_index_for() (2 calls)
- Fixed delta calculation to use BASE instead of USER
- This function is called on EVERY allocation in debug builds
Results After Fix:
✅ bench_random_mixed stable up to 66K iterations (~4.7x improvement)
✅ C7 alignment check failures eliminated (was: 100% failure rate)
✅ Front Gate "Unknown" classification dropped to 0% (was: 1.67%)
✅ No segfaults for workloads up to ~33K allocations
Remaining Issue:
❌ Segfault still occurs at iteration 66152 (allocs=33137, frees=33014)
- Different bug from USER/BASE conversion issues
- Likely capacity/boundary condition (further investigation needed)
Testing:
- bench_random_mixed_hakmem 1K-66K iterations: PASS
- bench_random_mixed_hakmem 67K+ iterations: FAIL (different bug)
- bench_fixed_size_hakmem 200K iterations: PASS
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 05:21:36 +09:00
|
|
|
|
int slab_idx = slab_index_for(owner_ss, base);
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
// BUGFIX: Validate slab_idx before array access (prevents OOB)
|
|
|
|
|
|
if (slab_idx < 0 || slab_idx >= ss_slabs_capacity(owner_ss)) {
|
|
|
|
|
|
continue; // Skip invalid index
|
|
|
|
|
|
}
|
|
|
|
|
|
TinySlabMeta* meta = &owner_ss->slabs[slab_idx];
|
2025-11-13 16:33:03 +09:00
|
|
|
|
// Use per-slab class for freelist linkage (Phase 12)
|
|
|
|
|
|
tiny_next_write(meta->class_idx, it.ptr, meta->freelist);
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
meta->freelist = it.ptr;
|
|
|
|
|
|
meta->used--;
|
|
|
|
|
|
// Decrement SuperSlab active counter (spill returns blocks to SS)
|
|
|
|
|
|
ss_active_dec_one(owner_ss);
|
|
|
|
|
|
|
|
|
|
|
|
// Phase 8.4: Empty SuperSlab detection (will use meta->used scan)
|
|
|
|
|
|
// TODO: Implement scan-based empty detection
|
|
|
|
|
|
// Empty SuperSlab detection/munmapは別途フラッシュAPIで実施(ホットパスから除外)
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
pthread_mutex_unlock(lock);
|
|
|
|
|
|
hkm_prof_end(ss_time, HKP_TINY_SPILL, &tss);
|
|
|
|
|
|
|
|
|
|
|
|
// Adaptive increase of cap after spill
|
|
|
|
|
|
int max_cap = tiny_cap_max_for_class(class_idx);
|
|
|
|
|
|
if (mag->cap < max_cap) {
|
|
|
|
|
|
int new_cap = mag->cap + (mag->cap / 2);
|
|
|
|
|
|
if (new_cap > max_cap) new_cap = max_cap;
|
|
|
|
|
|
if (new_cap > TINY_TLS_MAG_CAP) new_cap = TINY_TLS_MAG_CAP;
|
|
|
|
|
|
mag->cap = new_cap;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
// Finally, try FastCache push first (≤128B) — compile-out if HAKMEM_TINY_NO_FRONT_CACHE
|
|
|
|
|
|
#if !defined(HAKMEM_TINY_NO_FRONT_CACHE)
|
|
|
|
|
|
if (g_fastcache_enable && class_idx <= 4) {
|
2025-12-01 16:37:59 +09:00
|
|
|
|
// Phase 10: Use hak_base_ptr_t
|
|
|
|
|
|
hak_base_ptr_t base_ptr = hak_user_to_base(HAK_USER_FROM_RAW(ptr));
|
2025-12-04 11:05:06 +09:00
|
|
|
|
if (fastcache_push(class_idx, base_ptr)) {
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
HAK_TP1(front_push, class_idx);
|
|
|
|
|
|
HAK_STAT_FREE(class_idx);
|
|
|
|
|
|
return;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
#endif
|
|
|
|
|
|
// Then TLS SLL if room, else magazine
|
2025-11-20 07:32:30 +09:00
|
|
|
|
if (g_tls_sll_enable && g_tls_sll[class_idx].count < sll_cap_for_class(class_idx, (uint32_t)mag->cap)) {
|
2025-11-10 16:48:20 +09:00
|
|
|
|
uint32_t sll_cap2 = sll_cap_for_class(class_idx, (uint32_t)mag->cap);
|
2025-12-01 16:37:59 +09:00
|
|
|
|
// Phase 10: Use hak_base_ptr_t
|
|
|
|
|
|
hak_base_ptr_t base_ptr = hak_user_to_base(HAK_USER_FROM_RAW(ptr));
|
|
|
|
|
|
if (!tls_sll_push(class_idx, base_ptr, sll_cap2)) {
|
2025-11-10 16:48:20 +09:00
|
|
|
|
// fallback to magazine
|
2025-12-01 16:37:59 +09:00
|
|
|
|
mag->items[mag->top].ptr = HAK_BASE_TO_RAW(base_ptr);
|
2025-11-10 16:48:20 +09:00
|
|
|
|
#if HAKMEM_TINY_MAG_OWNER
|
|
|
|
|
|
mag->items[mag->top].owner = slab;
|
|
|
|
|
|
#endif
|
|
|
|
|
|
mag->top++;
|
|
|
|
|
|
}
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
} else {
|
2025-12-01 16:37:59 +09:00
|
|
|
|
// Phase 10: Use hak_base_ptr_t
|
|
|
|
|
|
hak_base_ptr_t base_ptr = hak_user_to_base(HAK_USER_FROM_RAW(ptr));
|
|
|
|
|
|
mag->items[mag->top].ptr = HAK_BASE_TO_RAW(base_ptr);
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
#if HAKMEM_TINY_MAG_OWNER
|
|
|
|
|
|
mag->items[mag->top].owner = slab;
|
|
|
|
|
|
#endif
|
|
|
|
|
|
mag->top++;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
#if HAKMEM_DEBUG_COUNTERS
|
|
|
|
|
|
g_magazine_push_count++; // Phase 7.6: Track pushes
|
|
|
|
|
|
#endif
|
|
|
|
|
|
HAK_STAT_FREE(class_idx);
|
|
|
|
|
|
return;
|
2025-12-05 20:43:14 +09:00
|
|
|
|
}
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
#endif // HAKMEM_BUILD_RELEASE
|
|
|
|
|
|
|
|
|
|
|
|
// Phase 7.6: TinySlab path (original)
|
|
|
|
|
|
//g_tiny_free_with_slab_count++; // Phase 7.6: Track calls - DISABLED due to segfault
|
|
|
|
|
|
// Same-thread → TLS magazine; remote-thread → MPSC stack
|
2025-12-01 16:37:59 +09:00
|
|
|
|
if (slab && pthread_equal(slab->owner_tid, tiny_self_pt())) {
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
int class_idx = slab->class_idx;
|
|
|
|
|
|
|
Phase E1-CORRECT: Fix USER/BASE pointer conversion bugs in slab_index_for calls
CRITICAL BUG FIX: Phase E1 introduced 1-byte headers for ALL size classes (C0-C7),
changing the pointer contract. However, many locations still called slab_index_for()
with USER pointers (storage+1) instead of BASE pointers (storage), causing off-by-one
slab index calculations that corrupted memory.
Root Cause:
- USER pointer = BASE + 1 (returned by malloc, points past header)
- BASE pointer = storage start (where 1-byte header is written)
- slab_index_for() expects BASE pointer for correct slab boundary calculations
- Passing USER pointer → wrong slab_idx → wrong metadata → freelist corruption
Impact Before Fix:
- bench_random_mixed crashes at ~14K iterations with SEGV
- Massive C7 alignment check failures (wrong slab classification)
- Memory corruption from writing to wrong slab freelists
Fixes Applied (8 locations):
1. core/hakmem_tiny_free.inc:137
- Added USER→BASE conversion before slab_index_for()
2. core/hakmem_tiny_ultra_simple.inc:148
- Added USER→BASE conversion before slab_index_for()
3. core/tiny_free_fast.inc.h:220
- Added USER→BASE conversion before slab_index_for()
4-5. core/tiny_free_magazine.inc.h:126,315
- Added USER→BASE conversion before slab_index_for() (2 locations)
6. core/box/free_local_box.c:14,22,62
- Added USER→BASE conversion before slab_index_for()
- Fixed delta calculation to use BASE instead of USER
- Fixed debug logging to use BASE instead of USER
7. core/hakmem_tiny.c:448,460,473 (tiny_debug_track_alloc_ret)
- Added USER→BASE conversion before slab_index_for() (2 calls)
- Fixed delta calculation to use BASE instead of USER
- This function is called on EVERY allocation in debug builds
Results After Fix:
✅ bench_random_mixed stable up to 66K iterations (~4.7x improvement)
✅ C7 alignment check failures eliminated (was: 100% failure rate)
✅ Front Gate "Unknown" classification dropped to 0% (was: 1.67%)
✅ No segfaults for workloads up to ~33K allocations
Remaining Issue:
❌ Segfault still occurs at iteration 66152 (allocs=33137, frees=33014)
- Different bug from USER/BASE conversion issues
- Likely capacity/boundary condition (further investigation needed)
Testing:
- bench_random_mixed_hakmem 1K-66K iterations: PASS
- bench_random_mixed_hakmem 67K+ iterations: FAIL (different bug)
- bench_fixed_size_hakmem 200K iterations: PASS
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 05:21:36 +09:00
|
|
|
|
// Phase E1-CORRECT: C7 now has headers, can use TLS list like other classes
|
|
|
|
|
|
if (g_tls_list_enable) {
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
TinyTLSList* tls = &g_tls_lists[class_idx];
|
|
|
|
|
|
uint32_t seq = atomic_load_explicit(&g_tls_param_seq[class_idx], memory_order_relaxed);
|
|
|
|
|
|
if (__builtin_expect(seq != g_tls_param_seen[class_idx], 0)) {
|
|
|
|
|
|
tiny_tls_refresh_params(class_idx, tls);
|
|
|
|
|
|
}
|
|
|
|
|
|
// TinyHotMag front push(8/16/32B, A/B)
|
|
|
|
|
|
if (__builtin_expect(g_hotmag_enable && class_idx <= 2, 1)) {
|
2025-12-01 16:37:59 +09:00
|
|
|
|
// Phase 10: Use hak_base_ptr_t
|
|
|
|
|
|
void* base = HAK_BASE_TO_RAW(hak_user_to_base(HAK_USER_FROM_RAW(ptr)));
|
2025-11-10 16:48:20 +09:00
|
|
|
|
if (hotmag_push(class_idx, base)) {
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
HAK_STAT_FREE(class_idx);
|
|
|
|
|
|
return;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
if (tls->count < tls->cap) {
|
2025-12-01 16:37:59 +09:00
|
|
|
|
// Phase 10: Use hak_base_ptr_t
|
|
|
|
|
|
void* base = HAK_BASE_TO_RAW(hak_user_to_base(HAK_USER_FROM_RAW(ptr)));
|
2025-11-10 16:48:20 +09:00
|
|
|
|
tiny_tls_list_guard_push(class_idx, tls, base);
|
2025-11-11 21:49:05 +09:00
|
|
|
|
tls_list_push_fast(tls, base, class_idx);
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
HAK_STAT_FREE(class_idx);
|
|
|
|
|
|
return;
|
|
|
|
|
|
}
|
|
|
|
|
|
seq = atomic_load_explicit(&g_tls_param_seq[class_idx], memory_order_relaxed);
|
|
|
|
|
|
if (__builtin_expect(seq != g_tls_param_seen[class_idx], 0)) {
|
|
|
|
|
|
tiny_tls_refresh_params(class_idx, tls);
|
|
|
|
|
|
}
|
2025-11-10 16:48:20 +09:00
|
|
|
|
{
|
2025-12-01 16:37:59 +09:00
|
|
|
|
// Phase 10: Use hak_base_ptr_t
|
|
|
|
|
|
void* base = HAK_BASE_TO_RAW(hak_user_to_base(HAK_USER_FROM_RAW(ptr)));
|
2025-11-10 16:48:20 +09:00
|
|
|
|
tiny_tls_list_guard_push(class_idx, tls, base);
|
2025-11-11 21:49:05 +09:00
|
|
|
|
tls_list_push_fast(tls, base, class_idx);
|
2025-11-10 16:48:20 +09:00
|
|
|
|
}
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
if (tls_list_should_spill(tls)) {
|
|
|
|
|
|
tls_list_spill_excess(class_idx, tls);
|
|
|
|
|
|
}
|
|
|
|
|
|
HAK_STAT_FREE(class_idx);
|
|
|
|
|
|
return;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
tiny_mag_init_if_needed(class_idx);
|
|
|
|
|
|
TinyTLSMag* mag = &g_tls_mags[class_idx];
|
|
|
|
|
|
int cap = mag->cap;
|
|
|
|
|
|
// 32/64B: SLL優先(mag優先は無効化)
|
|
|
|
|
|
// Fast path: FastCache push (preferred for ≤128B), then TLS SLL
|
|
|
|
|
|
if (g_fastcache_enable && class_idx <= 4) {
|
2025-12-04 11:05:06 +09:00
|
|
|
|
if (fastcache_push(class_idx, HAK_BASE_FROM_RAW(ptr))) {
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
HAK_STAT_FREE(class_idx);
|
|
|
|
|
|
return;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
// Fast path: TLS SLL push (preferred)
|
|
|
|
|
|
if (!g_tls_list_enable && g_tls_sll_enable && class_idx <= 5) {
|
|
|
|
|
|
uint32_t sll_cap = sll_cap_for_class(class_idx, (uint32_t)cap);
|
2025-11-20 07:32:30 +09:00
|
|
|
|
if (g_tls_sll[class_idx].count < sll_cap) {
|
2025-12-01 16:37:59 +09:00
|
|
|
|
// Phase 10: Use hak_base_ptr_t
|
|
|
|
|
|
hak_base_ptr_t base_ptr = hak_user_to_base(HAK_USER_FROM_RAW(ptr));
|
|
|
|
|
|
if (tls_sll_push(class_idx, base_ptr, sll_cap)) {
|
2025-11-10 16:48:20 +09:00
|
|
|
|
HAK_STAT_FREE(class_idx);
|
|
|
|
|
|
return;
|
|
|
|
|
|
}
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
// Next: if magazine has room, push immediately and return(満杯ならmag→SLLへバルク)
|
|
|
|
|
|
if (mag->top >= cap) {
|
|
|
|
|
|
(void)bulk_mag_to_sll_if_room(class_idx, mag, cap / 2);
|
|
|
|
|
|
}
|
|
|
|
|
|
// Remote-drain can be handled opportunistically on future calls.
|
|
|
|
|
|
if (mag->top < cap) {
|
2025-11-10 16:48:20 +09:00
|
|
|
|
{
|
2025-12-01 16:37:59 +09:00
|
|
|
|
// Phase 10: Use hak_base_ptr_t
|
|
|
|
|
|
hak_base_ptr_t base_ptr = hak_user_to_base(HAK_USER_FROM_RAW(ptr));
|
|
|
|
|
|
mag->items[mag->top].ptr = HAK_BASE_TO_RAW(base_ptr);
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
#if HAKMEM_TINY_MAG_OWNER
|
2025-11-10 16:48:20 +09:00
|
|
|
|
mag->items[mag->top].owner = slab;
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
#endif
|
2025-11-10 16:48:20 +09:00
|
|
|
|
mag->top++;
|
|
|
|
|
|
}
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
|
|
|
|
|
|
#if HAKMEM_DEBUG_COUNTERS
|
|
|
|
|
|
g_magazine_push_count++; // Phase 7.6: Track pushes
|
|
|
|
|
|
#endif
|
|
|
|
|
|
// Note: SuperSlab uses separate path (slab == NULL branch above)
|
|
|
|
|
|
HAK_STAT_FREE(class_idx); // Phase 3
|
|
|
|
|
|
return;
|
|
|
|
|
|
}
|
|
|
|
|
|
// Magazine full: before spilling, opportunistically drain remotes once under lock.
|
|
|
|
|
|
if (atomic_load_explicit(&slab->remote_count, memory_order_relaxed) >= (unsigned)g_remote_drain_thresh_per_class[class_idx] || atomic_load_explicit(&slab->remote_head, memory_order_acquire)) {
|
|
|
|
|
|
pthread_mutex_t* lock = &g_tiny_class_locks[class_idx].m;
|
|
|
|
|
|
pthread_mutex_lock(lock);
|
|
|
|
|
|
HAK_TP1(remote_drain, class_idx);
|
|
|
|
|
|
tiny_remote_drain_locked(slab);
|
|
|
|
|
|
pthread_mutex_unlock(lock);
|
|
|
|
|
|
}
|
|
|
|
|
|
// Spill half under class lock
|
|
|
|
|
|
pthread_mutex_t* lock = &g_tiny_class_locks[class_idx].m;
|
|
|
|
|
|
pthread_mutex_lock(lock);
|
|
|
|
|
|
int spill = cap / 2;
|
|
|
|
|
|
|
|
|
|
|
|
// Phase 4.2: High-water threshold for gating Phase 4 logic
|
|
|
|
|
|
int high_water = (cap * 3) / 4; // 75% of capacity
|
|
|
|
|
|
|
|
|
|
|
|
for (int i = 0; i < spill && mag->top > 0; i++) {
|
|
|
|
|
|
TinyMagItem it = mag->items[--mag->top];
|
|
|
|
|
|
|
|
|
|
|
|
// Phase 7.6: Check for SuperSlab first (mixed Magazine support)
|
|
|
|
|
|
SuperSlab* ss_owner = hak_super_lookup(it.ptr);
|
|
|
|
|
|
if (ss_owner && ss_owner->magic == SUPERSLAB_MAGIC) {
|
|
|
|
|
|
// SuperSlab spill - return to freelist
|
2025-12-03 12:29:31 +09:00
|
|
|
|
// FIX: it.ptr is BASE, use directly
|
|
|
|
|
|
void* base = it.ptr;
|
Phase E1-CORRECT: Fix USER/BASE pointer conversion bugs in slab_index_for calls
CRITICAL BUG FIX: Phase E1 introduced 1-byte headers for ALL size classes (C0-C7),
changing the pointer contract. However, many locations still called slab_index_for()
with USER pointers (storage+1) instead of BASE pointers (storage), causing off-by-one
slab index calculations that corrupted memory.
Root Cause:
- USER pointer = BASE + 1 (returned by malloc, points past header)
- BASE pointer = storage start (where 1-byte header is written)
- slab_index_for() expects BASE pointer for correct slab boundary calculations
- Passing USER pointer → wrong slab_idx → wrong metadata → freelist corruption
Impact Before Fix:
- bench_random_mixed crashes at ~14K iterations with SEGV
- Massive C7 alignment check failures (wrong slab classification)
- Memory corruption from writing to wrong slab freelists
Fixes Applied (8 locations):
1. core/hakmem_tiny_free.inc:137
- Added USER→BASE conversion before slab_index_for()
2. core/hakmem_tiny_ultra_simple.inc:148
- Added USER→BASE conversion before slab_index_for()
3. core/tiny_free_fast.inc.h:220
- Added USER→BASE conversion before slab_index_for()
4-5. core/tiny_free_magazine.inc.h:126,315
- Added USER→BASE conversion before slab_index_for() (2 locations)
6. core/box/free_local_box.c:14,22,62
- Added USER→BASE conversion before slab_index_for()
- Fixed delta calculation to use BASE instead of USER
- Fixed debug logging to use BASE instead of USER
7. core/hakmem_tiny.c:448,460,473 (tiny_debug_track_alloc_ret)
- Added USER→BASE conversion before slab_index_for() (2 calls)
- Fixed delta calculation to use BASE instead of USER
- This function is called on EVERY allocation in debug builds
Results After Fix:
✅ bench_random_mixed stable up to 66K iterations (~4.7x improvement)
✅ C7 alignment check failures eliminated (was: 100% failure rate)
✅ Front Gate "Unknown" classification dropped to 0% (was: 1.67%)
✅ No segfaults for workloads up to ~33K allocations
Remaining Issue:
❌ Segfault still occurs at iteration 66152 (allocs=33137, frees=33014)
- Different bug from USER/BASE conversion issues
- Likely capacity/boundary condition (further investigation needed)
Testing:
- bench_random_mixed_hakmem 1K-66K iterations: PASS
- bench_random_mixed_hakmem 67K+ iterations: FAIL (different bug)
- bench_fixed_size_hakmem 200K iterations: PASS
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 05:21:36 +09:00
|
|
|
|
int slab_idx = slab_index_for(ss_owner, base);
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
// BUGFIX: Validate slab_idx before array access (prevents OOB)
|
|
|
|
|
|
if (slab_idx < 0 || slab_idx >= ss_slabs_capacity(ss_owner)) {
|
|
|
|
|
|
HAK_STAT_FREE(class_idx);
|
|
|
|
|
|
continue; // Skip invalid index
|
|
|
|
|
|
}
|
|
|
|
|
|
TinySlabMeta* meta = &ss_owner->slabs[slab_idx];
|
2025-11-13 16:33:03 +09:00
|
|
|
|
// Use per-slab class for freelist linkage (Phase 12)
|
|
|
|
|
|
tiny_next_write(meta->class_idx, it.ptr, meta->freelist);
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
meta->freelist = it.ptr;
|
|
|
|
|
|
meta->used--;
|
|
|
|
|
|
// 空SuperSlab処理はフラッシュ/バックグラウンドで対応(ホットパス除外)
|
|
|
|
|
|
HAK_STAT_FREE(class_idx);
|
|
|
|
|
|
continue; // Skip TinySlab processing
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
TinySlab* owner =
|
|
|
|
|
|
#if HAKMEM_TINY_MAG_OWNER
|
|
|
|
|
|
it.owner;
|
|
|
|
|
|
#else
|
|
|
|
|
|
NULL;
|
|
|
|
|
|
#endif
|
|
|
|
|
|
if (!owner) {
|
|
|
|
|
|
owner = tls_active_owner_for_ptr(class_idx, it.ptr);
|
|
|
|
|
|
}
|
|
|
|
|
|
if (!owner) {
|
|
|
|
|
|
owner = hak_tiny_owner_slab(it.ptr);
|
|
|
|
|
|
}
|
|
|
|
|
|
if (!owner) continue;
|
|
|
|
|
|
|
|
|
|
|
|
// Phase 4.2: Adaptive gating - skip Phase 4 when TLS Magazine is high-water
|
|
|
|
|
|
// Rationale: When mag->top >= 75%, next alloc will come from TLS anyway
|
|
|
|
|
|
// so pushing to mini-mag is wasted work
|
|
|
|
|
|
int is_high_water = (mag->top >= high_water);
|
|
|
|
|
|
|
|
|
|
|
|
if (!is_high_water) {
|
|
|
|
|
|
// Low-water: Phase 4.1 logic (try mini-magazine first)
|
|
|
|
|
|
uint8_t cidx = owner->class_idx; // Option A: 1回だけ読む
|
|
|
|
|
|
TinySlab* tls_a = g_tls_active_slab_a[cidx];
|
|
|
|
|
|
TinySlab* tls_b = g_tls_active_slab_b[cidx];
|
|
|
|
|
|
|
|
|
|
|
|
// Option B: Branch prediction hint (spill → TLS-active への戻りが likely)
|
|
|
|
|
|
if (__builtin_expect((owner == tls_a || owner == tls_b) &&
|
|
|
|
|
|
!mini_mag_is_full(&owner->mini_mag), 1)) {
|
|
|
|
|
|
// Fast path: mini-magazineに戻す(bitmap触らない)
|
|
|
|
|
|
mini_mag_push(&owner->mini_mag, it.ptr);
|
|
|
|
|
|
HAK_TP1(spill_tiny, cidx);
|
|
|
|
|
|
HAK_STAT_FREE(cidx);
|
|
|
|
|
|
continue; // bitmap操作スキップ
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
// High-water or Phase 4.1 mini-mag full: fall through to bitmap
|
|
|
|
|
|
|
|
|
|
|
|
// Slow path: bitmap直接書き込み(既存ロジック)
|
|
|
|
|
|
size_t bs = g_tiny_class_sizes[owner->class_idx];
|
|
|
|
|
|
int idx = ((uintptr_t)it.ptr - (uintptr_t)owner->base) / bs;
|
|
|
|
|
|
if (hak_tiny_is_used(owner, idx)) {
|
|
|
|
|
|
hak_tiny_set_free(owner, idx);
|
|
|
|
|
|
int was_full = (owner->free_count == 0);
|
|
|
|
|
|
owner->free_count++;
|
|
|
|
|
|
if (was_full) move_to_free_list(owner->class_idx, owner);
|
|
|
|
|
|
if (owner->free_count == owner->total_count) {
|
|
|
|
|
|
// If this slab is TLS-active for this thread, clear the pointer before releasing
|
|
|
|
|
|
if (g_tls_active_slab_a[owner->class_idx] == owner) g_tls_active_slab_a[owner->class_idx] = NULL;
|
|
|
|
|
|
if (g_tls_active_slab_b[owner->class_idx] == owner) g_tls_active_slab_b[owner->class_idx] = NULL;
|
|
|
|
|
|
TinySlab** headp = &g_tiny_pool.free_slabs[owner->class_idx];
|
|
|
|
|
|
TinySlab* prev = NULL;
|
|
|
|
|
|
for (TinySlab* s = *headp; s; prev = s, s = s->next) {
|
|
|
|
|
|
if (s == owner) { if (prev) prev->next = s->next; else *headp = s->next; break; }
|
|
|
|
|
|
}
|
|
|
|
|
|
release_slab(owner);
|
|
|
|
|
|
}
|
|
|
|
|
|
HAK_TP1(spill_tiny, owner->class_idx);
|
|
|
|
|
|
HAK_STAT_FREE(owner->class_idx);
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
pthread_mutex_unlock(lock);
|
2025-12-01 16:37:59 +09:00
|
|
|
|
hkm_prof_end(ss_time, HKP_TINY_SPILL, &tss);
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
// Adaptive increase of cap after spill
|
|
|
|
|
|
int max_cap = tiny_cap_max_for_class(class_idx);
|
|
|
|
|
|
if (mag->cap < max_cap) {
|
|
|
|
|
|
int new_cap = mag->cap + (mag->cap / 2);
|
|
|
|
|
|
if (new_cap > max_cap) new_cap = max_cap;
|
|
|
|
|
|
if (new_cap > TINY_TLS_MAG_CAP) new_cap = TINY_TLS_MAG_CAP;
|
|
|
|
|
|
mag->cap = new_cap;
|
|
|
|
|
|
}
|
|
|
|
|
|
// Finally: prefer TinyQuickSlot → SLL → UltraFront → HotMag → Magazine(順序で局所性を確保)
|
|
|
|
|
|
#if !HAKMEM_BUILD_RELEASE && !defined(HAKMEM_TINY_NO_QUICK)
|
|
|
|
|
|
if (g_quick_enable && class_idx <= 4) {
|
|
|
|
|
|
TinyQuickSlot* qs = &g_tls_quick[class_idx];
|
|
|
|
|
|
if (__builtin_expect(qs->top < QUICK_CAP, 1)) {
|
2025-12-01 16:37:59 +09:00
|
|
|
|
// Phase 10: Use hak_base_ptr_t
|
|
|
|
|
|
hak_base_ptr_t base_ptr = hak_user_to_base(HAK_USER_FROM_RAW(ptr));
|
|
|
|
|
|
qs->items[qs->top++] = HAK_BASE_TO_RAW(base_ptr);
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
} else if (g_tls_sll_enable) {
|
|
|
|
|
|
uint32_t sll_cap2 = sll_cap_for_class(class_idx, (uint32_t)mag->cap);
|
2025-11-20 07:32:30 +09:00
|
|
|
|
if (g_tls_sll[class_idx].count < sll_cap2) {
|
2025-12-01 16:37:59 +09:00
|
|
|
|
// Phase 10: Use hak_base_ptr_t
|
|
|
|
|
|
hak_base_ptr_t base_ptr = hak_user_to_base(HAK_USER_FROM_RAW(ptr));
|
|
|
|
|
|
if (!tls_sll_push(class_idx, base_ptr, sll_cap2)) {
|
|
|
|
|
|
if (!tiny_optional_push(class_idx, HAK_BASE_TO_RAW(base_ptr))) {
|
|
|
|
|
|
mag->items[mag->top].ptr = HAK_BASE_TO_RAW(base_ptr);
|
2025-11-10 16:48:20 +09:00
|
|
|
|
#if HAKMEM_TINY_MAG_OWNER
|
|
|
|
|
|
mag->items[mag->top].owner = slab;
|
|
|
|
|
|
#endif
|
|
|
|
|
|
mag->top++;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
2025-12-03 12:29:31 +09:00
|
|
|
|
} else if (!tiny_optional_push(class_idx, HAK_BASE_TO_RAW(hak_user_to_base(HAK_USER_FROM_RAW(ptr))))) { // FIX: use ptr_user_to_base
|
2025-12-01 16:37:59 +09:00
|
|
|
|
// Phase 10: Use hak_base_ptr_t
|
|
|
|
|
|
hak_base_ptr_t base_ptr = hak_user_to_base(HAK_USER_FROM_RAW(ptr));
|
|
|
|
|
|
mag->items[mag->top].ptr = HAK_BASE_TO_RAW(base_ptr);
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
#if HAKMEM_TINY_MAG_OWNER
|
|
|
|
|
|
mag->items[mag->top].owner = slab;
|
|
|
|
|
|
#endif
|
|
|
|
|
|
mag->top++;
|
|
|
|
|
|
}
|
|
|
|
|
|
} else {
|
2025-12-01 16:37:59 +09:00
|
|
|
|
// Phase 10: Use hak_base_ptr_t
|
|
|
|
|
|
hak_base_ptr_t base_ptr = hak_user_to_base(HAK_USER_FROM_RAW(ptr));
|
|
|
|
|
|
if (!tiny_optional_push(class_idx, HAK_BASE_TO_RAW(base_ptr))) {
|
|
|
|
|
|
mag->items[mag->top].ptr = HAK_BASE_TO_RAW(base_ptr);
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
#if HAKMEM_TINY_MAG_OWNER
|
|
|
|
|
|
mag->items[mag->top].owner = slab;
|
|
|
|
|
|
#endif
|
|
|
|
|
|
mag->top++;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
} else
|
|
|
|
|
|
#endif
|
|
|
|
|
|
{
|
|
|
|
|
|
if (g_tls_sll_enable && class_idx <= 5) {
|
|
|
|
|
|
uint32_t sll_cap2 = sll_cap_for_class(class_idx, (uint32_t)mag->cap);
|
2025-11-20 07:32:30 +09:00
|
|
|
|
if (g_tls_sll[class_idx].count < sll_cap2) {
|
2025-12-01 16:37:59 +09:00
|
|
|
|
// Phase 10: Use hak_base_ptr_t
|
|
|
|
|
|
hak_base_ptr_t base_ptr = hak_user_to_base(HAK_USER_FROM_RAW(ptr));
|
|
|
|
|
|
if (!tls_sll_push(class_idx, base_ptr, sll_cap2)) {
|
|
|
|
|
|
if (!tiny_optional_push(class_idx, HAK_BASE_TO_RAW(base_ptr))) {
|
|
|
|
|
|
mag->items[mag->top].ptr = HAK_BASE_TO_RAW(base_ptr);
|
2025-11-10 16:48:20 +09:00
|
|
|
|
#if HAKMEM_TINY_MAG_OWNER
|
|
|
|
|
|
mag->items[mag->top].owner = slab;
|
|
|
|
|
|
#endif
|
|
|
|
|
|
mag->top++;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
2025-12-03 12:29:31 +09:00
|
|
|
|
} else if (!tiny_optional_push(class_idx, HAK_BASE_TO_RAW(hak_user_to_base(HAK_USER_FROM_RAW(ptr))))) { // FIX: use ptr_user_to_base
|
2025-12-01 16:37:59 +09:00
|
|
|
|
// Phase 10: Use hak_base_ptr_t
|
|
|
|
|
|
hak_base_ptr_t base_ptr = hak_user_to_base(HAK_USER_FROM_RAW(ptr));
|
|
|
|
|
|
mag->items[mag->top].ptr = HAK_BASE_TO_RAW(base_ptr);
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
#if HAKMEM_TINY_MAG_OWNER
|
|
|
|
|
|
mag->items[mag->top].owner = slab;
|
|
|
|
|
|
#endif
|
|
|
|
|
|
mag->top++;
|
|
|
|
|
|
}
|
|
|
|
|
|
} else {
|
2025-12-01 16:37:59 +09:00
|
|
|
|
// Phase 10: Use hak_base_ptr_t
|
|
|
|
|
|
hak_base_ptr_t base_ptr = hak_user_to_base(HAK_USER_FROM_RAW(ptr));
|
|
|
|
|
|
if (!tiny_optional_push(class_idx, HAK_BASE_TO_RAW(base_ptr))) {
|
|
|
|
|
|
mag->items[mag->top].ptr = HAK_BASE_TO_RAW(base_ptr);
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
#if HAKMEM_TINY_MAG_OWNER
|
|
|
|
|
|
mag->items[mag->top].owner = slab;
|
|
|
|
|
|
#endif
|
|
|
|
|
|
mag->top++;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
#if HAKMEM_DEBUG_COUNTERS
|
|
|
|
|
|
g_magazine_push_count++; // Phase 7.6: Track pushes
|
|
|
|
|
|
#endif
|
|
|
|
|
|
// Note: SuperSlab uses separate path (slab == NULL branch above)
|
|
|
|
|
|
HAK_STAT_FREE(class_idx); // Phase 3
|
|
|
|
|
|
return;
|
2025-12-01 16:37:59 +09:00
|
|
|
|
} else if (slab) {
|
Phase E1-CORRECT: Fix USER/BASE pointer conversion bugs in slab_index_for calls
CRITICAL BUG FIX: Phase E1 introduced 1-byte headers for ALL size classes (C0-C7),
changing the pointer contract. However, many locations still called slab_index_for()
with USER pointers (storage+1) instead of BASE pointers (storage), causing off-by-one
slab index calculations that corrupted memory.
Root Cause:
- USER pointer = BASE + 1 (returned by malloc, points past header)
- BASE pointer = storage start (where 1-byte header is written)
- slab_index_for() expects BASE pointer for correct slab boundary calculations
- Passing USER pointer → wrong slab_idx → wrong metadata → freelist corruption
Impact Before Fix:
- bench_random_mixed crashes at ~14K iterations with SEGV
- Massive C7 alignment check failures (wrong slab classification)
- Memory corruption from writing to wrong slab freelists
Fixes Applied (8 locations):
1. core/hakmem_tiny_free.inc:137
- Added USER→BASE conversion before slab_index_for()
2. core/hakmem_tiny_ultra_simple.inc:148
- Added USER→BASE conversion before slab_index_for()
3. core/tiny_free_fast.inc.h:220
- Added USER→BASE conversion before slab_index_for()
4-5. core/tiny_free_magazine.inc.h:126,315
- Added USER→BASE conversion before slab_index_for() (2 locations)
6. core/box/free_local_box.c:14,22,62
- Added USER→BASE conversion before slab_index_for()
- Fixed delta calculation to use BASE instead of USER
- Fixed debug logging to use BASE instead of USER
7. core/hakmem_tiny.c:448,460,473 (tiny_debug_track_alloc_ret)
- Added USER→BASE conversion before slab_index_for() (2 calls)
- Fixed delta calculation to use BASE instead of USER
- This function is called on EVERY allocation in debug builds
Results After Fix:
✅ bench_random_mixed stable up to 66K iterations (~4.7x improvement)
✅ C7 alignment check failures eliminated (was: 100% failure rate)
✅ Front Gate "Unknown" classification dropped to 0% (was: 1.67%)
✅ No segfaults for workloads up to ~33K allocations
Remaining Issue:
❌ Segfault still occurs at iteration 66152 (allocs=33137, frees=33014)
- Different bug from USER/BASE conversion issues
- Likely capacity/boundary condition (further investigation needed)
Testing:
- bench_random_mixed_hakmem 1K-66K iterations: PASS
- bench_random_mixed_hakmem 67K+ iterations: FAIL (different bug)
- bench_fixed_size_hakmem 200K iterations: PASS
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 05:21:36 +09:00
|
|
|
|
// Phase E1-CORRECT: ALL classes (C0-C7) have 1-byte header
|
2025-12-03 12:29:31 +09:00
|
|
|
|
// FIX: Use ptr_user_to_base to get correct base
|
|
|
|
|
|
void* base = HAK_BASE_TO_RAW(hak_user_to_base(HAK_USER_FROM_RAW(ptr)));
|
2025-11-10 16:48:20 +09:00
|
|
|
|
tiny_remote_push(slab, base);
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
}
|
2025-12-04 11:05:06 +09:00
|
|
|
|
}
|