2025-11-05 12:31:14 +09:00
|
|
|
|
#include <inttypes.h>
|
2025-11-08 21:35:43 +09:00
|
|
|
|
#include <pthread.h>
|
2025-11-05 12:31:14 +09:00
|
|
|
|
#include "tiny_remote.h"
|
|
|
|
|
|
#include "slab_handle.h"
|
|
|
|
|
|
#include "tiny_refill.h"
|
|
|
|
|
|
#include "tiny_tls_guard.h"
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
#include "box/free_publish_box.h"
|
2025-11-10 16:48:20 +09:00
|
|
|
|
#include "box/tls_sll_box.h" // Box TLS-SLL: C7-safe push/pop/splice
|
Phase E1-CORRECT: Fix USER/BASE pointer conversion bugs in slab_index_for calls
CRITICAL BUG FIX: Phase E1 introduced 1-byte headers for ALL size classes (C0-C7),
changing the pointer contract. However, many locations still called slab_index_for()
with USER pointers (storage+1) instead of BASE pointers (storage), causing off-by-one
slab index calculations that corrupted memory.
Root Cause:
- USER pointer = BASE + 1 (returned by malloc, points past header)
- BASE pointer = storage start (where 1-byte header is written)
- slab_index_for() expects BASE pointer for correct slab boundary calculations
- Passing USER pointer → wrong slab_idx → wrong metadata → freelist corruption
Impact Before Fix:
- bench_random_mixed crashes at ~14K iterations with SEGV
- Massive C7 alignment check failures (wrong slab classification)
- Memory corruption from writing to wrong slab freelists
Fixes Applied (8 locations):
1. core/hakmem_tiny_free.inc:137
- Added USER→BASE conversion before slab_index_for()
2. core/hakmem_tiny_ultra_simple.inc:148
- Added USER→BASE conversion before slab_index_for()
3. core/tiny_free_fast.inc.h:220
- Added USER→BASE conversion before slab_index_for()
4-5. core/tiny_free_magazine.inc.h:126,315
- Added USER→BASE conversion before slab_index_for() (2 locations)
6. core/box/free_local_box.c:14,22,62
- Added USER→BASE conversion before slab_index_for()
- Fixed delta calculation to use BASE instead of USER
- Fixed debug logging to use BASE instead of USER
7. core/hakmem_tiny.c:448,460,473 (tiny_debug_track_alloc_ret)
- Added USER→BASE conversion before slab_index_for() (2 calls)
- Fixed delta calculation to use BASE instead of USER
- This function is called on EVERY allocation in debug builds
Results After Fix:
✅ bench_random_mixed stable up to 66K iterations (~4.7x improvement)
✅ C7 alignment check failures eliminated (was: 100% failure rate)
✅ Front Gate "Unknown" classification dropped to 0% (was: 1.67%)
✅ No segfaults for workloads up to ~33K allocations
Remaining Issue:
❌ Segfault still occurs at iteration 66152 (allocs=33137, frees=33014)
- Different bug from USER/BASE conversion issues
- Likely capacity/boundary condition (further investigation needed)
Testing:
- bench_random_mixed_hakmem 1K-66K iterations: PASS
- bench_random_mixed_hakmem 67K+ iterations: FAIL (different bug)
- bench_fixed_size_hakmem 200K iterations: PASS
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 05:21:36 +09:00
|
|
|
|
#include "box/tiny_next_ptr_box.h" // Box API: next pointer read/write
|
2025-11-29 07:57:49 +09:00
|
|
|
|
#include "box/tiny_header_box.h" // Header Box: Single Source of Truth for header operations
|
2025-11-29 17:12:15 +09:00
|
|
|
|
#include "box/tiny_front_config_box.h" // Phase 7-Step5: Config macros for dead code elimination
|
2025-11-29 06:11:48 +09:00
|
|
|
|
#include "tiny_region_id.h" // HEADER_MAGIC, HEADER_CLASS_MASK for freelist header restoration
|
2025-11-05 12:31:14 +09:00
|
|
|
|
#include "mid_tcache.h"
|
2025-11-21 23:00:24 +09:00
|
|
|
|
#include "front/tiny_heap_v2.h"
|
2025-12-01 16:37:59 +09:00
|
|
|
|
#include "box/ptr_type_box.h" // Phase 10: Type Safety
|
Implement Phase 1: TLS SuperSlab Hint Box for Headerless performance
Design: Cache recently-used SuperSlab references in TLS to accelerate
ptr→SuperSlab resolution in Headerless mode free() path.
## Implementation
### New Box: core/box/tls_ss_hint_box.h
- Header-only Box (4-slot FIFO cache per thread)
- Functions: tls_ss_hint_init(), tls_ss_hint_update(), tls_ss_hint_lookup(), tls_ss_hint_clear()
- Memory overhead: 112 bytes per thread (negligible)
- Statistics API for debug builds (hit/miss counters)
### Integration Points
1. **Free path** (core/hakmem_tiny_free.inc):
- Lines 477-481: Fast path hint lookup before hak_super_lookup()
- Lines 550-555: Second lookup location (fallback path)
- Expected savings: 10-50 cycles → 2-5 cycles on cache hit
2. **Allocation path** (core/tiny_superslab_alloc.inc.h):
- Lines 115-122: Linear allocation return path
- Lines 179-186: Freelist allocation return path
- Cache update on successful allocation
3. **TLS variable** (core/hakmem_tiny_tls_state_box.inc):
- `__thread TlsSsHintCache g_tls_ss_hint = {0};`
### Build System
- **Build flag** (core/hakmem_build_flags.h):
- HAKMEM_TINY_SS_TLS_HINT (default: 0, disabled)
- Validation: requires HAKMEM_TINY_HEADERLESS=1
- **Makefile**:
- Removed old ss_tls_hint_box.o (conflicting implementation)
- Header-only design eliminates compiled object files
### Testing
- **Unit tests** (tests/test_tls_ss_hint.c):
- 6 test functions covering init, lookup, FIFO rotation, duplicates, clear, stats
- All tests PASSING
- **Build validation**:
- ✅ Compiles with hint disabled (default)
- ✅ Compiles with hint enabled (HAKMEM_TINY_SS_TLS_HINT=1)
### Documentation
- **Benchmark report** (docs/PHASE1_TLS_HINT_BENCHMARK.md):
- Implementation summary
- Build validation results
- Benchmark methodology (to be executed)
- Performance analysis framework
## Expected Performance
- **Hit rate**: 85-95% (single-threaded), 70-85% (multi-threaded)
- **Cycle savings**: 80-95% on cache hit (10-50 cycles → 2-5 cycles)
- **Target improvement**: 15-20% throughput increase vs Headerless baseline
- **Memory overhead**: 112 bytes per thread
## Box Theory
**Mission**: Cache hot SuperSlabs to avoid global registry lookup
**Boundary**: ptr → SuperSlab* or NULL (miss)
**Invariant**: hint.base ≤ ptr < hint.end → hit is valid
**Fallback**: Always safe to miss (triggers hak_super_lookup)
**Thread Safety**: TLS storage, no synchronization required
**Risk**: Low (read-only cache, fail-safe fallback, magic validation)
## Next Steps
1. Run full benchmark suite (sh8bench, cfrac, larson)
2. Measure actual hit rate with stats enabled
3. If performance target met (15-20% improvement), enable by default
4. Consider increasing cache slots if hit rate < 80%
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 18:06:24 +09:00
|
|
|
|
#if HAKMEM_TINY_SS_TLS_HINT
|
|
|
|
|
|
#include "box/tls_ss_hint_box.h" // Phase 1: TLS SuperSlab Hint Cache for Headerless mode
|
|
|
|
|
|
#endif
|
2025-11-20 07:32:30 +09:00
|
|
|
|
// Phase 3d-B: TLS Cache Merge - Unified TLS SLL structure
|
|
|
|
|
|
extern __thread TinyTLSSLL g_tls_sll[TINY_NUM_CLASSES];
|
2025-11-05 12:31:14 +09:00
|
|
|
|
#if !HAKMEM_BUILD_RELEASE
|
|
|
|
|
|
#include "hakmem_tiny_magazine.h"
|
|
|
|
|
|
#endif
|
|
|
|
|
|
extern int g_tiny_force_remote;
|
|
|
|
|
|
|
|
|
|
|
|
// ENV: HAKMEM_TINY_DRAIN_TO_SLL (0=off) — adopt/bind境界でfreelist→TLS SLLへN個スプライス
|
|
|
|
|
|
static inline int tiny_drain_to_sll_budget(void) {
|
|
|
|
|
|
static int v = -1;
|
|
|
|
|
|
if (__builtin_expect(v == -1, 0)) {
|
|
|
|
|
|
const char* s = getenv("HAKMEM_TINY_DRAIN_TO_SLL");
|
|
|
|
|
|
int parsed = (s && *s) ? atoi(s) : 0;
|
2025-12-10 09:08:18 +09:00
|
|
|
|
if (parsed < 0) parsed = 0;
|
|
|
|
|
|
if (parsed > 256) parsed = 256;
|
2025-11-05 12:31:14 +09:00
|
|
|
|
v = parsed;
|
|
|
|
|
|
}
|
|
|
|
|
|
return v;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
static inline void tiny_drain_freelist_to_sll_once(SuperSlab* ss, int slab_idx, int class_idx) {
|
|
|
|
|
|
int budget = tiny_drain_to_sll_budget();
|
|
|
|
|
|
if (__builtin_expect(budget <= 0, 1)) return;
|
Phase E1-CORRECT: Fix USER/BASE pointer conversion bugs in slab_index_for calls
CRITICAL BUG FIX: Phase E1 introduced 1-byte headers for ALL size classes (C0-C7),
changing the pointer contract. However, many locations still called slab_index_for()
with USER pointers (storage+1) instead of BASE pointers (storage), causing off-by-one
slab index calculations that corrupted memory.
Root Cause:
- USER pointer = BASE + 1 (returned by malloc, points past header)
- BASE pointer = storage start (where 1-byte header is written)
- slab_index_for() expects BASE pointer for correct slab boundary calculations
- Passing USER pointer → wrong slab_idx → wrong metadata → freelist corruption
Impact Before Fix:
- bench_random_mixed crashes at ~14K iterations with SEGV
- Massive C7 alignment check failures (wrong slab classification)
- Memory corruption from writing to wrong slab freelists
Fixes Applied (8 locations):
1. core/hakmem_tiny_free.inc:137
- Added USER→BASE conversion before slab_index_for()
2. core/hakmem_tiny_ultra_simple.inc:148
- Added USER→BASE conversion before slab_index_for()
3. core/tiny_free_fast.inc.h:220
- Added USER→BASE conversion before slab_index_for()
4-5. core/tiny_free_magazine.inc.h:126,315
- Added USER→BASE conversion before slab_index_for() (2 locations)
6. core/box/free_local_box.c:14,22,62
- Added USER→BASE conversion before slab_index_for()
- Fixed delta calculation to use BASE instead of USER
- Fixed debug logging to use BASE instead of USER
7. core/hakmem_tiny.c:448,460,473 (tiny_debug_track_alloc_ret)
- Added USER→BASE conversion before slab_index_for() (2 calls)
- Fixed delta calculation to use BASE instead of USER
- This function is called on EVERY allocation in debug builds
Results After Fix:
✅ bench_random_mixed stable up to 66K iterations (~4.7x improvement)
✅ C7 alignment check failures eliminated (was: 100% failure rate)
✅ Front Gate "Unknown" classification dropped to 0% (was: 1.67%)
✅ No segfaults for workloads up to ~33K allocations
Remaining Issue:
❌ Segfault still occurs at iteration 66152 (allocs=33137, frees=33014)
- Different bug from USER/BASE conversion issues
- Likely capacity/boundary condition (further investigation needed)
Testing:
- bench_random_mixed_hakmem 1K-66K iterations: PASS
- bench_random_mixed_hakmem 67K+ iterations: FAIL (different bug)
- bench_fixed_size_hakmem 200K iterations: PASS
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 05:21:36 +09:00
|
|
|
|
// Phase E1-CORRECT: C7 now has headers, can use TLS SLL like other classes
|
|
|
|
|
|
// (removed early return for class_idx == 7)
|
2025-11-05 12:31:14 +09:00
|
|
|
|
if (!(ss && ss->magic == SUPERSLAB_MAGIC)) return;
|
|
|
|
|
|
if (slab_idx < 0) return;
|
|
|
|
|
|
TinySlabMeta* m = &ss->slabs[slab_idx];
|
|
|
|
|
|
int moved = 0;
|
|
|
|
|
|
while (m->freelist && moved < budget) {
|
|
|
|
|
|
void* p = m->freelist;
|
2025-11-08 01:18:37 +09:00
|
|
|
|
|
|
|
|
|
|
// CORRUPTION DEBUG: Validate freelist pointer before moving to TLS SLL
|
|
|
|
|
|
if (__builtin_expect(tiny_refill_failfast_level() >= 2, 0)) {
|
|
|
|
|
|
extern const size_t g_tiny_class_sizes[];
|
|
|
|
|
|
size_t blk = g_tiny_class_sizes[class_idx];
|
2025-12-01 16:37:59 +09:00
|
|
|
|
void* old_head_raw = HAK_BASE_TO_RAW(g_tls_sll[class_idx].head);
|
2025-11-08 01:18:37 +09:00
|
|
|
|
|
|
|
|
|
|
// Validate p alignment
|
|
|
|
|
|
if (((uintptr_t)p % blk) != 0) {
|
|
|
|
|
|
fprintf(stderr, "[DRAIN_CORRUPT] Freelist ptr=%p misaligned (cls=%d blk=%zu offset=%zu)\n",
|
|
|
|
|
|
p, class_idx, blk, (uintptr_t)p % blk);
|
|
|
|
|
|
fprintf(stderr, "[DRAIN_CORRUPT] Attempting to drain corrupted freelist to TLS SLL!\n");
|
|
|
|
|
|
fprintf(stderr, "[DRAIN_CORRUPT] ss=%p slab=%d moved=%d/%d\n", ss, slab_idx, moved, budget);
|
|
|
|
|
|
abort();
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
// Validate old_head alignment if not NULL
|
2025-12-01 16:37:59 +09:00
|
|
|
|
if (old_head_raw && ((uintptr_t)old_head_raw % blk) != 0) {
|
2025-11-08 01:18:37 +09:00
|
|
|
|
fprintf(stderr, "[DRAIN_CORRUPT] TLS SLL head=%p already corrupted! (cls=%d blk=%zu offset=%zu)\n",
|
2025-12-01 16:37:59 +09:00
|
|
|
|
old_head_raw, class_idx, blk, (uintptr_t)old_head_raw % blk);
|
2025-11-08 01:18:37 +09:00
|
|
|
|
fprintf(stderr, "[DRAIN_CORRUPT] Corruption detected BEFORE drain write (ptr=%p)\n", p);
|
|
|
|
|
|
fprintf(stderr, "[DRAIN_CORRUPT] ss=%p slab=%d moved=%d/%d\n", ss, slab_idx, moved, budget);
|
|
|
|
|
|
abort();
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
fprintf(stderr, "[DRAIN_TO_SLL] cls=%d ptr=%p old_head=%p moved=%d/%d\n",
|
2025-12-01 16:37:59 +09:00
|
|
|
|
class_idx, p, old_head_raw, moved, budget);
|
2025-11-08 01:18:37 +09:00
|
|
|
|
}
|
|
|
|
|
|
|
Phase E1-CORRECT: Fix USER/BASE pointer conversion bugs in slab_index_for calls
CRITICAL BUG FIX: Phase E1 introduced 1-byte headers for ALL size classes (C0-C7),
changing the pointer contract. However, many locations still called slab_index_for()
with USER pointers (storage+1) instead of BASE pointers (storage), causing off-by-one
slab index calculations that corrupted memory.
Root Cause:
- USER pointer = BASE + 1 (returned by malloc, points past header)
- BASE pointer = storage start (where 1-byte header is written)
- slab_index_for() expects BASE pointer for correct slab boundary calculations
- Passing USER pointer → wrong slab_idx → wrong metadata → freelist corruption
Impact Before Fix:
- bench_random_mixed crashes at ~14K iterations with SEGV
- Massive C7 alignment check failures (wrong slab classification)
- Memory corruption from writing to wrong slab freelists
Fixes Applied (8 locations):
1. core/hakmem_tiny_free.inc:137
- Added USER→BASE conversion before slab_index_for()
2. core/hakmem_tiny_ultra_simple.inc:148
- Added USER→BASE conversion before slab_index_for()
3. core/tiny_free_fast.inc.h:220
- Added USER→BASE conversion before slab_index_for()
4-5. core/tiny_free_magazine.inc.h:126,315
- Added USER→BASE conversion before slab_index_for() (2 locations)
6. core/box/free_local_box.c:14,22,62
- Added USER→BASE conversion before slab_index_for()
- Fixed delta calculation to use BASE instead of USER
- Fixed debug logging to use BASE instead of USER
7. core/hakmem_tiny.c:448,460,473 (tiny_debug_track_alloc_ret)
- Added USER→BASE conversion before slab_index_for() (2 calls)
- Fixed delta calculation to use BASE instead of USER
- This function is called on EVERY allocation in debug builds
Results After Fix:
✅ bench_random_mixed stable up to 66K iterations (~4.7x improvement)
✅ C7 alignment check failures eliminated (was: 100% failure rate)
✅ Front Gate "Unknown" classification dropped to 0% (was: 1.67%)
✅ No segfaults for workloads up to ~33K allocations
Remaining Issue:
❌ Segfault still occurs at iteration 66152 (allocs=33137, frees=33014)
- Different bug from USER/BASE conversion issues
- Likely capacity/boundary condition (further investigation needed)
Testing:
- bench_random_mixed_hakmem 1K-66K iterations: PASS
- bench_random_mixed_hakmem 67K+ iterations: FAIL (different bug)
- bench_fixed_size_hakmem 200K iterations: PASS
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 05:21:36 +09:00
|
|
|
|
m->freelist = tiny_next_read(class_idx, p); // Phase E1-CORRECT: Box API
|
2025-11-10 16:48:20 +09:00
|
|
|
|
|
2025-11-29 06:11:48 +09:00
|
|
|
|
// CRITICAL FIX: Restore header BEFORE pushing to TLS SLL
|
|
|
|
|
|
// Freelist blocks may have stale data at offset 0
|
2025-11-29 07:57:49 +09:00
|
|
|
|
// Uses Header Box API (C1-C6 only; C0/C7 skip)
|
|
|
|
|
|
tiny_header_write_if_preserved(p, class_idx);
|
2025-11-29 06:11:48 +09:00
|
|
|
|
|
2025-11-10 16:48:20 +09:00
|
|
|
|
// Use Box TLS-SLL API (C7-safe push)
|
|
|
|
|
|
// Note: C7 already rejected at line 34, so this always succeeds
|
|
|
|
|
|
uint32_t sll_capacity = 256; // Conservative limit
|
2025-12-01 16:37:59 +09:00
|
|
|
|
// Phase 10: p is BASE pointer (freelist), wrap it
|
|
|
|
|
|
if (tls_sll_push(class_idx, HAK_BASE_FROM_RAW(p), sll_capacity)) {
|
2025-11-10 16:48:20 +09:00
|
|
|
|
moved++;
|
|
|
|
|
|
} else {
|
|
|
|
|
|
// SLL full, stop draining
|
|
|
|
|
|
break;
|
|
|
|
|
|
}
|
2025-11-05 12:31:14 +09:00
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
static inline int tiny_remote_queue_contains_guard(SuperSlab* ss, int slab_idx, void* target) {
|
|
|
|
|
|
if (!ss || slab_idx < 0) return 0;
|
|
|
|
|
|
uintptr_t cur = atomic_load_explicit(&ss->remote_heads[slab_idx], memory_order_acquire);
|
|
|
|
|
|
int limit = 8192;
|
|
|
|
|
|
while (cur && limit-- > 0) {
|
|
|
|
|
|
if ((void*)cur == target) {
|
|
|
|
|
|
return 1;
|
|
|
|
|
|
}
|
|
|
|
|
|
uintptr_t next;
|
|
|
|
|
|
if (__builtin_expect(g_remote_side_enable, 0)) {
|
|
|
|
|
|
next = tiny_remote_side_get(ss, slab_idx, (void*)cur);
|
|
|
|
|
|
} else {
|
|
|
|
|
|
next = atomic_load_explicit((_Atomic uintptr_t*)cur, memory_order_relaxed);
|
|
|
|
|
|
}
|
|
|
|
|
|
cur = next;
|
|
|
|
|
|
}
|
|
|
|
|
|
if (limit <= 0) {
|
|
|
|
|
|
return 1; // fail-safe: treat unbounded traversal as duplicate
|
|
|
|
|
|
}
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
// Phase 6.12.1: Free with pre-calculated slab (Option C - avoids duplicate lookup)
|
|
|
|
|
|
void hak_tiny_free_with_slab(void* ptr, TinySlab* slab) {
|
|
|
|
|
|
// Phase 7.6: slab == NULL means SuperSlab mode (Magazine integration)
|
2025-12-01 16:37:59 +09:00
|
|
|
|
SuperSlab* ss = NULL;
|
2025-12-05 20:43:14 +09:00
|
|
|
|
TinySlabMeta* meta = NULL;
|
|
|
|
|
|
int class_idx = -1;
|
|
|
|
|
|
int slab_idx = -1;
|
2025-11-05 12:31:14 +09:00
|
|
|
|
if (!slab) {
|
|
|
|
|
|
// SuperSlab path: Get class_idx from SuperSlab
|
2025-12-01 16:37:59 +09:00
|
|
|
|
ss = hak_super_lookup(ptr);
|
2025-11-05 12:31:14 +09:00
|
|
|
|
if (!ss || ss->magic != SUPERSLAB_MAGIC) return;
|
2025-11-13 16:33:03 +09:00
|
|
|
|
// Derive class_idx from per-slab metadata instead of ss->size_class
|
2025-12-05 20:43:14 +09:00
|
|
|
|
class_idx = -1;
|
2025-12-03 12:29:31 +09:00
|
|
|
|
// void* base = ptr_user_to_base_blind(ptr); // FIX: Use ptr (USER) directly
|
2025-12-05 20:43:14 +09:00
|
|
|
|
slab_idx = slab_index_for(ss, ptr); // FIX: slab_index_for works better with ptr (USER) for C0/C7
|
2025-11-13 16:33:03 +09:00
|
|
|
|
if (slab_idx >= 0 && slab_idx < ss_slabs_capacity(ss)) {
|
|
|
|
|
|
TinySlabMeta* meta_probe = &ss->slabs[slab_idx];
|
|
|
|
|
|
if (meta_probe->class_idx < TINY_NUM_CLASSES) {
|
|
|
|
|
|
class_idx = (int)meta_probe->class_idx;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
2025-11-05 12:31:14 +09:00
|
|
|
|
size_t ss_size = (size_t)1ULL << ss->lg_size;
|
|
|
|
|
|
uintptr_t ss_base = (uintptr_t)ss;
|
|
|
|
|
|
if (__builtin_expect(class_idx < 0 || class_idx >= TINY_NUM_CLASSES, 0)) {
|
2025-11-13 16:33:03 +09:00
|
|
|
|
tiny_debug_ring_record(TINY_RING_EVENT_SUPERSLAB_ADOPT_FAIL, (uint16_t)0xFFu, ss, (uintptr_t)class_idx);
|
2025-11-05 12:31:14 +09:00
|
|
|
|
return;
|
|
|
|
|
|
}
|
|
|
|
|
|
// Optional: cross-lookup TinySlab owner and detect class mismatch early
|
Phase E1-CORRECT: Fix USER/BASE pointer conversion bugs in slab_index_for calls
CRITICAL BUG FIX: Phase E1 introduced 1-byte headers for ALL size classes (C0-C7),
changing the pointer contract. However, many locations still called slab_index_for()
with USER pointers (storage+1) instead of BASE pointers (storage), causing off-by-one
slab index calculations that corrupted memory.
Root Cause:
- USER pointer = BASE + 1 (returned by malloc, points past header)
- BASE pointer = storage start (where 1-byte header is written)
- slab_index_for() expects BASE pointer for correct slab boundary calculations
- Passing USER pointer → wrong slab_idx → wrong metadata → freelist corruption
Impact Before Fix:
- bench_random_mixed crashes at ~14K iterations with SEGV
- Massive C7 alignment check failures (wrong slab classification)
- Memory corruption from writing to wrong slab freelists
Fixes Applied (8 locations):
1. core/hakmem_tiny_free.inc:137
- Added USER→BASE conversion before slab_index_for()
2. core/hakmem_tiny_ultra_simple.inc:148
- Added USER→BASE conversion before slab_index_for()
3. core/tiny_free_fast.inc.h:220
- Added USER→BASE conversion before slab_index_for()
4-5. core/tiny_free_magazine.inc.h:126,315
- Added USER→BASE conversion before slab_index_for() (2 locations)
6. core/box/free_local_box.c:14,22,62
- Added USER→BASE conversion before slab_index_for()
- Fixed delta calculation to use BASE instead of USER
- Fixed debug logging to use BASE instead of USER
7. core/hakmem_tiny.c:448,460,473 (tiny_debug_track_alloc_ret)
- Added USER→BASE conversion before slab_index_for() (2 calls)
- Fixed delta calculation to use BASE instead of USER
- This function is called on EVERY allocation in debug builds
Results After Fix:
✅ bench_random_mixed stable up to 66K iterations (~4.7x improvement)
✅ C7 alignment check failures eliminated (was: 100% failure rate)
✅ Front Gate "Unknown" classification dropped to 0% (was: 1.67%)
✅ No segfaults for workloads up to ~33K allocations
Remaining Issue:
❌ Segfault still occurs at iteration 66152 (allocs=33137, frees=33014)
- Different bug from USER/BASE conversion issues
- Likely capacity/boundary condition (further investigation needed)
Testing:
- bench_random_mixed_hakmem 1K-66K iterations: PASS
- bench_random_mixed_hakmem 67K+ iterations: FAIL (different bug)
- bench_fixed_size_hakmem 200K iterations: PASS
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 05:21:36 +09:00
|
|
|
|
// Phase E1-CORRECT: All classes have headers now, standard safe_free guard
|
|
|
|
|
|
if (__builtin_expect(g_tiny_safe_free, 0)) {
|
2025-11-05 12:31:14 +09:00
|
|
|
|
TinySlab* ts = hak_tiny_owner_slab(ptr);
|
|
|
|
|
|
if (ts) {
|
|
|
|
|
|
int ts_cls = ts->class_idx;
|
|
|
|
|
|
if (ts_cls >= 0 && ts_cls < TINY_NUM_CLASSES && ts_cls != class_idx) {
|
|
|
|
|
|
uint32_t code = 0xAA00u | ((uint32_t)ts_cls & 0xFFu);
|
|
|
|
|
|
uintptr_t aux = tiny_remote_pack_diag(code, ss_base, ss_size, (uintptr_t)ptr);
|
|
|
|
|
|
tiny_debug_ring_record(TINY_RING_EVENT_REMOTE_INVALID, (uint16_t)class_idx, ptr, aux);
|
|
|
|
|
|
if (g_tiny_safe_free_strict) { raise(SIGUSR2); return; }
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
tiny_debug_ring_record(TINY_RING_EVENT_FREE_ENTER, (uint16_t)class_idx, ptr, 0);
|
|
|
|
|
|
// Detect cross-thread: cross-thread free MUST go via superslab path
|
2025-12-03 12:29:31 +09:00
|
|
|
|
// FIX: Use ptr (USER) for slab index calculation to handle C0/C7 boundary correctly
|
|
|
|
|
|
// base = ptr_user_to_base_blind(ptr);
|
|
|
|
|
|
slab_idx = slab_index_for(ss, ptr);
|
2025-11-05 12:31:14 +09:00
|
|
|
|
int ss_cap = ss_slabs_capacity(ss);
|
|
|
|
|
|
if (__builtin_expect(slab_idx < 0 || slab_idx >= ss_cap, 0)) {
|
|
|
|
|
|
tiny_debug_ring_record(TINY_RING_EVENT_SUPERSLAB_ADOPT_FAIL, (uint16_t)0xFEu, ss, (uintptr_t)slab_idx);
|
|
|
|
|
|
return;
|
|
|
|
|
|
}
|
2025-12-05 20:43:14 +09:00
|
|
|
|
meta = &ss->slabs[slab_idx];
|
Phase E1-CORRECT: Fix USER/BASE pointer conversion bugs in slab_index_for calls
CRITICAL BUG FIX: Phase E1 introduced 1-byte headers for ALL size classes (C0-C7),
changing the pointer contract. However, many locations still called slab_index_for()
with USER pointers (storage+1) instead of BASE pointers (storage), causing off-by-one
slab index calculations that corrupted memory.
Root Cause:
- USER pointer = BASE + 1 (returned by malloc, points past header)
- BASE pointer = storage start (where 1-byte header is written)
- slab_index_for() expects BASE pointer for correct slab boundary calculations
- Passing USER pointer → wrong slab_idx → wrong metadata → freelist corruption
Impact Before Fix:
- bench_random_mixed crashes at ~14K iterations with SEGV
- Massive C7 alignment check failures (wrong slab classification)
- Memory corruption from writing to wrong slab freelists
Fixes Applied (8 locations):
1. core/hakmem_tiny_free.inc:137
- Added USER→BASE conversion before slab_index_for()
2. core/hakmem_tiny_ultra_simple.inc:148
- Added USER→BASE conversion before slab_index_for()
3. core/tiny_free_fast.inc.h:220
- Added USER→BASE conversion before slab_index_for()
4-5. core/tiny_free_magazine.inc.h:126,315
- Added USER→BASE conversion before slab_index_for() (2 locations)
6. core/box/free_local_box.c:14,22,62
- Added USER→BASE conversion before slab_index_for()
- Fixed delta calculation to use BASE instead of USER
- Fixed debug logging to use BASE instead of USER
7. core/hakmem_tiny.c:448,460,473 (tiny_debug_track_alloc_ret)
- Added USER→BASE conversion before slab_index_for() (2 calls)
- Fixed delta calculation to use BASE instead of USER
- This function is called on EVERY allocation in debug builds
Results After Fix:
✅ bench_random_mixed stable up to 66K iterations (~4.7x improvement)
✅ C7 alignment check failures eliminated (was: 100% failure rate)
✅ Front Gate "Unknown" classification dropped to 0% (was: 1.67%)
✅ No segfaults for workloads up to ~33K allocations
Remaining Issue:
❌ Segfault still occurs at iteration 66152 (allocs=33137, frees=33014)
- Different bug from USER/BASE conversion issues
- Likely capacity/boundary condition (further investigation needed)
Testing:
- bench_random_mixed_hakmem 1K-66K iterations: PASS
- bench_random_mixed_hakmem 67K+ iterations: FAIL (different bug)
- bench_fixed_size_hakmem 200K iterations: PASS
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 05:21:36 +09:00
|
|
|
|
if (__builtin_expect(g_tiny_safe_free, 0)) {
|
2025-11-05 12:31:14 +09:00
|
|
|
|
size_t blk = g_tiny_class_sizes[class_idx];
|
Phase E1-CORRECT: Fix USER/BASE pointer conversion bugs in slab_index_for calls
CRITICAL BUG FIX: Phase E1 introduced 1-byte headers for ALL size classes (C0-C7),
changing the pointer contract. However, many locations still called slab_index_for()
with USER pointers (storage+1) instead of BASE pointers (storage), causing off-by-one
slab index calculations that corrupted memory.
Root Cause:
- USER pointer = BASE + 1 (returned by malloc, points past header)
- BASE pointer = storage start (where 1-byte header is written)
- slab_index_for() expects BASE pointer for correct slab boundary calculations
- Passing USER pointer → wrong slab_idx → wrong metadata → freelist corruption
Impact Before Fix:
- bench_random_mixed crashes at ~14K iterations with SEGV
- Massive C7 alignment check failures (wrong slab classification)
- Memory corruption from writing to wrong slab freelists
Fixes Applied (8 locations):
1. core/hakmem_tiny_free.inc:137
- Added USER→BASE conversion before slab_index_for()
2. core/hakmem_tiny_ultra_simple.inc:148
- Added USER→BASE conversion before slab_index_for()
3. core/tiny_free_fast.inc.h:220
- Added USER→BASE conversion before slab_index_for()
4-5. core/tiny_free_magazine.inc.h:126,315
- Added USER→BASE conversion before slab_index_for() (2 locations)
6. core/box/free_local_box.c:14,22,62
- Added USER→BASE conversion before slab_index_for()
- Fixed delta calculation to use BASE instead of USER
- Fixed debug logging to use BASE instead of USER
7. core/hakmem_tiny.c:448,460,473 (tiny_debug_track_alloc_ret)
- Added USER→BASE conversion before slab_index_for() (2 calls)
- Fixed delta calculation to use BASE instead of USER
- This function is called on EVERY allocation in debug builds
Results After Fix:
✅ bench_random_mixed stable up to 66K iterations (~4.7x improvement)
✅ C7 alignment check failures eliminated (was: 100% failure rate)
✅ Front Gate "Unknown" classification dropped to 0% (was: 1.67%)
✅ No segfaults for workloads up to ~33K allocations
Remaining Issue:
❌ Segfault still occurs at iteration 66152 (allocs=33137, frees=33014)
- Different bug from USER/BASE conversion issues
- Likely capacity/boundary condition (further investigation needed)
Testing:
- bench_random_mixed_hakmem 1K-66K iterations: PASS
- bench_random_mixed_hakmem 67K+ iterations: FAIL (different bug)
- bench_fixed_size_hakmem 200K iterations: PASS
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 05:21:36 +09:00
|
|
|
|
uint8_t* slab_base = tiny_slab_base_for(ss, slab_idx);
|
2025-12-03 12:29:31 +09:00
|
|
|
|
// Phase E1-CORRECT: All classes have headers, validate block base using known class_idx
|
|
|
|
|
|
uintptr_t delta = (uintptr_t)HAK_BASE_TO_RAW(ptr_user_to_base(HAK_USER_FROM_RAW(ptr), class_idx)) - (uintptr_t)slab_base;
|
2025-11-05 12:31:14 +09:00
|
|
|
|
int cap_ok = (meta->capacity > 0) ? 1 : 0;
|
|
|
|
|
|
int align_ok = (delta % blk) == 0;
|
|
|
|
|
|
int range_ok = cap_ok && (delta / blk) < meta->capacity;
|
|
|
|
|
|
if (!align_ok || !range_ok) {
|
2025-12-01 16:37:59 +09:00
|
|
|
|
uint32_t code = 0xA100u;
|
2025-11-05 12:31:14 +09:00
|
|
|
|
if (align_ok) code |= 0x2u;
|
|
|
|
|
|
if (range_ok) code |= 0x1u;
|
|
|
|
|
|
uintptr_t aux = tiny_remote_pack_diag(code, ss_base, ss_size, (uintptr_t)ptr);
|
|
|
|
|
|
tiny_debug_ring_record(TINY_RING_EVENT_REMOTE_INVALID, (uint16_t)class_idx, ptr, aux);
|
Phase E1-CORRECT: Fix USER/BASE pointer conversion bugs in slab_index_for calls
CRITICAL BUG FIX: Phase E1 introduced 1-byte headers for ALL size classes (C0-C7),
changing the pointer contract. However, many locations still called slab_index_for()
with USER pointers (storage+1) instead of BASE pointers (storage), causing off-by-one
slab index calculations that corrupted memory.
Root Cause:
- USER pointer = BASE + 1 (returned by malloc, points past header)
- BASE pointer = storage start (where 1-byte header is written)
- slab_index_for() expects BASE pointer for correct slab boundary calculations
- Passing USER pointer → wrong slab_idx → wrong metadata → freelist corruption
Impact Before Fix:
- bench_random_mixed crashes at ~14K iterations with SEGV
- Massive C7 alignment check failures (wrong slab classification)
- Memory corruption from writing to wrong slab freelists
Fixes Applied (8 locations):
1. core/hakmem_tiny_free.inc:137
- Added USER→BASE conversion before slab_index_for()
2. core/hakmem_tiny_ultra_simple.inc:148
- Added USER→BASE conversion before slab_index_for()
3. core/tiny_free_fast.inc.h:220
- Added USER→BASE conversion before slab_index_for()
4-5. core/tiny_free_magazine.inc.h:126,315
- Added USER→BASE conversion before slab_index_for() (2 locations)
6. core/box/free_local_box.c:14,22,62
- Added USER→BASE conversion before slab_index_for()
- Fixed delta calculation to use BASE instead of USER
- Fixed debug logging to use BASE instead of USER
7. core/hakmem_tiny.c:448,460,473 (tiny_debug_track_alloc_ret)
- Added USER→BASE conversion before slab_index_for() (2 calls)
- Fixed delta calculation to use BASE instead of USER
- This function is called on EVERY allocation in debug builds
Results After Fix:
✅ bench_random_mixed stable up to 66K iterations (~4.7x improvement)
✅ C7 alignment check failures eliminated (was: 100% failure rate)
✅ Front Gate "Unknown" classification dropped to 0% (was: 1.67%)
✅ No segfaults for workloads up to ~33K allocations
Remaining Issue:
❌ Segfault still occurs at iteration 66152 (allocs=33137, frees=33014)
- Different bug from USER/BASE conversion issues
- Likely capacity/boundary condition (further investigation needed)
Testing:
- bench_random_mixed_hakmem 1K-66K iterations: PASS
- bench_random_mixed_hakmem 67K+ iterations: FAIL (different bug)
- bench_fixed_size_hakmem 200K iterations: PASS
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 05:21:36 +09:00
|
|
|
|
if (g_tiny_safe_free_strict) { raise(SIGUSR2); return; }
|
2025-11-05 12:31:14 +09:00
|
|
|
|
return;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
uint32_t self_tid = tiny_self_u32();
|
2025-11-13 16:33:03 +09:00
|
|
|
|
uint8_t self_tid_low = (uint8_t)self_tid;
|
|
|
|
|
|
if (__builtin_expect(meta->owner_tid_low != self_tid_low || meta->owner_tid_low == 0, 0)) {
|
2025-11-05 12:31:14 +09:00
|
|
|
|
// route directly to superslab (remote queue / freelist)
|
|
|
|
|
|
uintptr_t ptr_val = (uintptr_t)ptr;
|
|
|
|
|
|
uintptr_t ss_base = (uintptr_t)ss;
|
|
|
|
|
|
size_t ss_size = (size_t)1ULL << ss->lg_size;
|
|
|
|
|
|
if (__builtin_expect(ptr_val < ss_base || ptr_val >= ss_base + ss_size, 0)) {
|
|
|
|
|
|
tiny_debug_ring_record(TINY_RING_EVENT_SUPERSLAB_ADOPT_FAIL, (uint16_t)0xFDu, ss, ptr_val);
|
|
|
|
|
|
return;
|
|
|
|
|
|
}
|
|
|
|
|
|
tiny_debug_ring_record(TINY_RING_EVENT_FREE_REMOTE, (uint16_t)class_idx, ss, (uintptr_t)ptr);
|
|
|
|
|
|
hak_tiny_free_superslab(ptr, ss);
|
|
|
|
|
|
HAK_STAT_FREE(class_idx);
|
|
|
|
|
|
return;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
// A/B: Force SS freelist path for same-thread frees (publish on first-free)
|
|
|
|
|
|
do {
|
|
|
|
|
|
static int g_free_to_ss2 = -1;
|
|
|
|
|
|
if (__builtin_expect(g_free_to_ss2 == -1, 0)) {
|
|
|
|
|
|
const char* e = getenv("HAKMEM_TINY_FREE_TO_SS");
|
|
|
|
|
|
g_free_to_ss2 = (e && *e && *e != '0') ? 1 : 0; // default OFF
|
|
|
|
|
|
}
|
|
|
|
|
|
if (g_free_to_ss2) {
|
|
|
|
|
|
hak_tiny_free_superslab(ptr, ss);
|
|
|
|
|
|
HAK_STAT_FREE(class_idx);
|
|
|
|
|
|
return;
|
|
|
|
|
|
}
|
|
|
|
|
|
} while (0);
|
|
|
|
|
|
|
2025-11-05 12:31:14 +09:00
|
|
|
|
if (__builtin_expect(g_debug_fast0, 0)) {
|
|
|
|
|
|
tiny_debug_ring_record(TINY_RING_EVENT_FRONT_BYPASS, (uint16_t)class_idx, ptr, (uintptr_t)slab_idx);
|
Phase E1-CORRECT: Fix USER/BASE pointer conversion bugs in slab_index_for calls
CRITICAL BUG FIX: Phase E1 introduced 1-byte headers for ALL size classes (C0-C7),
changing the pointer contract. However, many locations still called slab_index_for()
with USER pointers (storage+1) instead of BASE pointers (storage), causing off-by-one
slab index calculations that corrupted memory.
Root Cause:
- USER pointer = BASE + 1 (returned by malloc, points past header)
- BASE pointer = storage start (where 1-byte header is written)
- slab_index_for() expects BASE pointer for correct slab boundary calculations
- Passing USER pointer → wrong slab_idx → wrong metadata → freelist corruption
Impact Before Fix:
- bench_random_mixed crashes at ~14K iterations with SEGV
- Massive C7 alignment check failures (wrong slab classification)
- Memory corruption from writing to wrong slab freelists
Fixes Applied (8 locations):
1. core/hakmem_tiny_free.inc:137
- Added USER→BASE conversion before slab_index_for()
2. core/hakmem_tiny_ultra_simple.inc:148
- Added USER→BASE conversion before slab_index_for()
3. core/tiny_free_fast.inc.h:220
- Added USER→BASE conversion before slab_index_for()
4-5. core/tiny_free_magazine.inc.h:126,315
- Added USER→BASE conversion before slab_index_for() (2 locations)
6. core/box/free_local_box.c:14,22,62
- Added USER→BASE conversion before slab_index_for()
- Fixed delta calculation to use BASE instead of USER
- Fixed debug logging to use BASE instead of USER
7. core/hakmem_tiny.c:448,460,473 (tiny_debug_track_alloc_ret)
- Added USER→BASE conversion before slab_index_for() (2 calls)
- Fixed delta calculation to use BASE instead of USER
- This function is called on EVERY allocation in debug builds
Results After Fix:
✅ bench_random_mixed stable up to 66K iterations (~4.7x improvement)
✅ C7 alignment check failures eliminated (was: 100% failure rate)
✅ Front Gate "Unknown" classification dropped to 0% (was: 1.67%)
✅ No segfaults for workloads up to ~33K allocations
Remaining Issue:
❌ Segfault still occurs at iteration 66152 (allocs=33137, frees=33014)
- Different bug from USER/BASE conversion issues
- Likely capacity/boundary condition (further investigation needed)
Testing:
- bench_random_mixed_hakmem 1K-66K iterations: PASS
- bench_random_mixed_hakmem 67K+ iterations: FAIL (different bug)
- bench_fixed_size_hakmem 200K iterations: PASS
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 05:21:36 +09:00
|
|
|
|
// Phase E1-CORRECT: ALL classes (C0-C7) have 1-byte header
|
2025-12-03 12:29:31 +09:00
|
|
|
|
void* base = HAK_BASE_TO_RAW(ptr_user_to_base(HAK_USER_FROM_RAW(ptr), class_idx));
|
2025-11-05 12:31:14 +09:00
|
|
|
|
void* prev = meta->freelist;
|
Phase E1-CORRECT: Fix USER/BASE pointer conversion bugs in slab_index_for calls
CRITICAL BUG FIX: Phase E1 introduced 1-byte headers for ALL size classes (C0-C7),
changing the pointer contract. However, many locations still called slab_index_for()
with USER pointers (storage+1) instead of BASE pointers (storage), causing off-by-one
slab index calculations that corrupted memory.
Root Cause:
- USER pointer = BASE + 1 (returned by malloc, points past header)
- BASE pointer = storage start (where 1-byte header is written)
- slab_index_for() expects BASE pointer for correct slab boundary calculations
- Passing USER pointer → wrong slab_idx → wrong metadata → freelist corruption
Impact Before Fix:
- bench_random_mixed crashes at ~14K iterations with SEGV
- Massive C7 alignment check failures (wrong slab classification)
- Memory corruption from writing to wrong slab freelists
Fixes Applied (8 locations):
1. core/hakmem_tiny_free.inc:137
- Added USER→BASE conversion before slab_index_for()
2. core/hakmem_tiny_ultra_simple.inc:148
- Added USER→BASE conversion before slab_index_for()
3. core/tiny_free_fast.inc.h:220
- Added USER→BASE conversion before slab_index_for()
4-5. core/tiny_free_magazine.inc.h:126,315
- Added USER→BASE conversion before slab_index_for() (2 locations)
6. core/box/free_local_box.c:14,22,62
- Added USER→BASE conversion before slab_index_for()
- Fixed delta calculation to use BASE instead of USER
- Fixed debug logging to use BASE instead of USER
7. core/hakmem_tiny.c:448,460,473 (tiny_debug_track_alloc_ret)
- Added USER→BASE conversion before slab_index_for() (2 calls)
- Fixed delta calculation to use BASE instead of USER
- This function is called on EVERY allocation in debug builds
Results After Fix:
✅ bench_random_mixed stable up to 66K iterations (~4.7x improvement)
✅ C7 alignment check failures eliminated (was: 100% failure rate)
✅ Front Gate "Unknown" classification dropped to 0% (was: 1.67%)
✅ No segfaults for workloads up to ~33K allocations
Remaining Issue:
❌ Segfault still occurs at iteration 66152 (allocs=33137, frees=33014)
- Different bug from USER/BASE conversion issues
- Likely capacity/boundary condition (further investigation needed)
Testing:
- bench_random_mixed_hakmem 1K-66K iterations: PASS
- bench_random_mixed_hakmem 67K+ iterations: FAIL (different bug)
- bench_fixed_size_hakmem 200K iterations: PASS
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 05:21:36 +09:00
|
|
|
|
tiny_next_write(class_idx, base, prev); // Box API: uses offset 1 for headers
|
2025-11-10 16:48:20 +09:00
|
|
|
|
meta->freelist = base;
|
2025-11-05 12:31:14 +09:00
|
|
|
|
meta->used--;
|
|
|
|
|
|
ss_active_dec_one(ss);
|
|
|
|
|
|
if (prev == NULL) {
|
2025-11-13 16:33:03 +09:00
|
|
|
|
// Publish using the slab's class (per-slab class_idx)
|
|
|
|
|
|
ss_partial_publish(class_idx, ss);
|
2025-11-05 12:31:14 +09:00
|
|
|
|
}
|
|
|
|
|
|
tiny_debug_ring_record(TINY_RING_EVENT_FREE_LOCAL, (uint16_t)class_idx, ptr, (uintptr_t)slab_idx);
|
|
|
|
|
|
HAK_STAT_FREE(class_idx);
|
|
|
|
|
|
return;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2025-11-21 23:00:24 +09:00
|
|
|
|
// Front-V2: try to return to TLS magazine first (A/B, default OFF)
|
2025-11-29 17:40:05 +09:00
|
|
|
|
// Phase 7-Step8: Use config macro for dead code elimination in PGO mode
|
|
|
|
|
|
if (__builtin_expect(TINY_FRONT_HEAP_V2_ENABLED && class_idx <= 3, 0)) {
|
2025-12-03 12:29:31 +09:00
|
|
|
|
void* base = HAK_BASE_TO_RAW(ptr_user_to_base(HAK_USER_FROM_RAW(ptr), class_idx));
|
2025-11-21 23:00:24 +09:00
|
|
|
|
if (tiny_heap_v2_try_push(class_idx, base)) {
|
|
|
|
|
|
tiny_debug_ring_record(TINY_RING_EVENT_FREE_FAST, (uint16_t)class_idx, ptr, slab_idx);
|
|
|
|
|
|
HAK_STAT_FREE(class_idx);
|
|
|
|
|
|
return;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2025-11-05 12:31:14 +09:00
|
|
|
|
if (g_fast_enable && g_fast_cap[class_idx] != 0) {
|
Phase E1-CORRECT: Fix USER/BASE pointer conversion bugs in slab_index_for calls
CRITICAL BUG FIX: Phase E1 introduced 1-byte headers for ALL size classes (C0-C7),
changing the pointer contract. However, many locations still called slab_index_for()
with USER pointers (storage+1) instead of BASE pointers (storage), causing off-by-one
slab index calculations that corrupted memory.
Root Cause:
- USER pointer = BASE + 1 (returned by malloc, points past header)
- BASE pointer = storage start (where 1-byte header is written)
- slab_index_for() expects BASE pointer for correct slab boundary calculations
- Passing USER pointer → wrong slab_idx → wrong metadata → freelist corruption
Impact Before Fix:
- bench_random_mixed crashes at ~14K iterations with SEGV
- Massive C7 alignment check failures (wrong slab classification)
- Memory corruption from writing to wrong slab freelists
Fixes Applied (8 locations):
1. core/hakmem_tiny_free.inc:137
- Added USER→BASE conversion before slab_index_for()
2. core/hakmem_tiny_ultra_simple.inc:148
- Added USER→BASE conversion before slab_index_for()
3. core/tiny_free_fast.inc.h:220
- Added USER→BASE conversion before slab_index_for()
4-5. core/tiny_free_magazine.inc.h:126,315
- Added USER→BASE conversion before slab_index_for() (2 locations)
6. core/box/free_local_box.c:14,22,62
- Added USER→BASE conversion before slab_index_for()
- Fixed delta calculation to use BASE instead of USER
- Fixed debug logging to use BASE instead of USER
7. core/hakmem_tiny.c:448,460,473 (tiny_debug_track_alloc_ret)
- Added USER→BASE conversion before slab_index_for() (2 calls)
- Fixed delta calculation to use BASE instead of USER
- This function is called on EVERY allocation in debug builds
Results After Fix:
✅ bench_random_mixed stable up to 66K iterations (~4.7x improvement)
✅ C7 alignment check failures eliminated (was: 100% failure rate)
✅ Front Gate "Unknown" classification dropped to 0% (was: 1.67%)
✅ No segfaults for workloads up to ~33K allocations
Remaining Issue:
❌ Segfault still occurs at iteration 66152 (allocs=33137, frees=33014)
- Different bug from USER/BASE conversion issues
- Likely capacity/boundary condition (further investigation needed)
Testing:
- bench_random_mixed_hakmem 1K-66K iterations: PASS
- bench_random_mixed_hakmem 67K+ iterations: FAIL (different bug)
- bench_fixed_size_hakmem 200K iterations: PASS
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 05:21:36 +09:00
|
|
|
|
// Phase E1-CORRECT: ALL classes (C0-C7) have 1-byte header
|
2025-12-04 11:05:06 +09:00
|
|
|
|
hak_base_ptr_t base = ptr_user_to_base(HAK_USER_FROM_RAW(ptr), class_idx);
|
2025-11-11 21:48:10 +09:00
|
|
|
|
int pushed = 0;
|
2025-11-29 17:12:15 +09:00
|
|
|
|
// Phase 7-Step5: Use config macro for dead code elimination in PGO mode
|
|
|
|
|
|
if (__builtin_expect(TINY_FRONT_FASTCACHE_ENABLED && class_idx <= 3, 1)) {
|
2025-11-11 21:48:10 +09:00
|
|
|
|
pushed = fastcache_push(class_idx, base);
|
|
|
|
|
|
} else {
|
|
|
|
|
|
pushed = tiny_fast_push(class_idx, base);
|
|
|
|
|
|
}
|
|
|
|
|
|
if (pushed) {
|
2025-11-05 12:31:14 +09:00
|
|
|
|
tiny_debug_ring_record(TINY_RING_EVENT_FREE_FAST, (uint16_t)class_idx, ptr, slab_idx);
|
|
|
|
|
|
HAK_STAT_FREE(class_idx);
|
|
|
|
|
|
return;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2025-11-11 10:00:36 +09:00
|
|
|
|
if (g_tls_list_enable && class_idx != 7) {
|
2025-11-05 12:31:14 +09:00
|
|
|
|
TinyTLSList* tls = &g_tls_lists[class_idx];
|
|
|
|
|
|
uint32_t seq = atomic_load_explicit(&g_tls_param_seq[class_idx], memory_order_relaxed);
|
|
|
|
|
|
if (__builtin_expect(seq != g_tls_param_seen[class_idx], 0)) {
|
|
|
|
|
|
tiny_tls_refresh_params(class_idx, tls);
|
|
|
|
|
|
}
|
|
|
|
|
|
// TinyHotMag front push(8/16/32B, A/B)
|
|
|
|
|
|
if (__builtin_expect(g_hotmag_enable && class_idx <= 2, 1)) {
|
Phase E1-CORRECT: Fix USER/BASE pointer conversion bugs in slab_index_for calls
CRITICAL BUG FIX: Phase E1 introduced 1-byte headers for ALL size classes (C0-C7),
changing the pointer contract. However, many locations still called slab_index_for()
with USER pointers (storage+1) instead of BASE pointers (storage), causing off-by-one
slab index calculations that corrupted memory.
Root Cause:
- USER pointer = BASE + 1 (returned by malloc, points past header)
- BASE pointer = storage start (where 1-byte header is written)
- slab_index_for() expects BASE pointer for correct slab boundary calculations
- Passing USER pointer → wrong slab_idx → wrong metadata → freelist corruption
Impact Before Fix:
- bench_random_mixed crashes at ~14K iterations with SEGV
- Massive C7 alignment check failures (wrong slab classification)
- Memory corruption from writing to wrong slab freelists
Fixes Applied (8 locations):
1. core/hakmem_tiny_free.inc:137
- Added USER→BASE conversion before slab_index_for()
2. core/hakmem_tiny_ultra_simple.inc:148
- Added USER→BASE conversion before slab_index_for()
3. core/tiny_free_fast.inc.h:220
- Added USER→BASE conversion before slab_index_for()
4-5. core/tiny_free_magazine.inc.h:126,315
- Added USER→BASE conversion before slab_index_for() (2 locations)
6. core/box/free_local_box.c:14,22,62
- Added USER→BASE conversion before slab_index_for()
- Fixed delta calculation to use BASE instead of USER
- Fixed debug logging to use BASE instead of USER
7. core/hakmem_tiny.c:448,460,473 (tiny_debug_track_alloc_ret)
- Added USER→BASE conversion before slab_index_for() (2 calls)
- Fixed delta calculation to use BASE instead of USER
- This function is called on EVERY allocation in debug builds
Results After Fix:
✅ bench_random_mixed stable up to 66K iterations (~4.7x improvement)
✅ C7 alignment check failures eliminated (was: 100% failure rate)
✅ Front Gate "Unknown" classification dropped to 0% (was: 1.67%)
✅ No segfaults for workloads up to ~33K allocations
Remaining Issue:
❌ Segfault still occurs at iteration 66152 (allocs=33137, frees=33014)
- Different bug from USER/BASE conversion issues
- Likely capacity/boundary condition (further investigation needed)
Testing:
- bench_random_mixed_hakmem 1K-66K iterations: PASS
- bench_random_mixed_hakmem 67K+ iterations: FAIL (different bug)
- bench_fixed_size_hakmem 200K iterations: PASS
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 05:21:36 +09:00
|
|
|
|
// Phase E1-CORRECT: ALL classes (C0-C7) have 1-byte header
|
2025-12-03 12:29:31 +09:00
|
|
|
|
void* base = HAK_BASE_TO_RAW(ptr_user_to_base(HAK_USER_FROM_RAW(ptr), class_idx));
|
2025-11-10 16:48:20 +09:00
|
|
|
|
if (hotmag_push(class_idx, base)) {
|
2025-11-05 12:31:14 +09:00
|
|
|
|
tiny_debug_ring_record(TINY_RING_EVENT_FREE_RETURN_MAG, (uint16_t)class_idx, ptr, 1);
|
|
|
|
|
|
HAK_STAT_FREE(class_idx);
|
|
|
|
|
|
return;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
if (tls->count < tls->cap) {
|
Phase E1-CORRECT: Fix USER/BASE pointer conversion bugs in slab_index_for calls
CRITICAL BUG FIX: Phase E1 introduced 1-byte headers for ALL size classes (C0-C7),
changing the pointer contract. However, many locations still called slab_index_for()
with USER pointers (storage+1) instead of BASE pointers (storage), causing off-by-one
slab index calculations that corrupted memory.
Root Cause:
- USER pointer = BASE + 1 (returned by malloc, points past header)
- BASE pointer = storage start (where 1-byte header is written)
- slab_index_for() expects BASE pointer for correct slab boundary calculations
- Passing USER pointer → wrong slab_idx → wrong metadata → freelist corruption
Impact Before Fix:
- bench_random_mixed crashes at ~14K iterations with SEGV
- Massive C7 alignment check failures (wrong slab classification)
- Memory corruption from writing to wrong slab freelists
Fixes Applied (8 locations):
1. core/hakmem_tiny_free.inc:137
- Added USER→BASE conversion before slab_index_for()
2. core/hakmem_tiny_ultra_simple.inc:148
- Added USER→BASE conversion before slab_index_for()
3. core/tiny_free_fast.inc.h:220
- Added USER→BASE conversion before slab_index_for()
4-5. core/tiny_free_magazine.inc.h:126,315
- Added USER→BASE conversion before slab_index_for() (2 locations)
6. core/box/free_local_box.c:14,22,62
- Added USER→BASE conversion before slab_index_for()
- Fixed delta calculation to use BASE instead of USER
- Fixed debug logging to use BASE instead of USER
7. core/hakmem_tiny.c:448,460,473 (tiny_debug_track_alloc_ret)
- Added USER→BASE conversion before slab_index_for() (2 calls)
- Fixed delta calculation to use BASE instead of USER
- This function is called on EVERY allocation in debug builds
Results After Fix:
✅ bench_random_mixed stable up to 66K iterations (~4.7x improvement)
✅ C7 alignment check failures eliminated (was: 100% failure rate)
✅ Front Gate "Unknown" classification dropped to 0% (was: 1.67%)
✅ No segfaults for workloads up to ~33K allocations
Remaining Issue:
❌ Segfault still occurs at iteration 66152 (allocs=33137, frees=33014)
- Different bug from USER/BASE conversion issues
- Likely capacity/boundary condition (further investigation needed)
Testing:
- bench_random_mixed_hakmem 1K-66K iterations: PASS
- bench_random_mixed_hakmem 67K+ iterations: FAIL (different bug)
- bench_fixed_size_hakmem 200K iterations: PASS
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 05:21:36 +09:00
|
|
|
|
// Phase E1-CORRECT: ALL classes (C0-C7) have 1-byte header
|
2025-12-03 12:29:31 +09:00
|
|
|
|
void* base = HAK_BASE_TO_RAW(ptr_user_to_base(HAK_USER_FROM_RAW(ptr), class_idx));
|
2025-11-10 16:48:20 +09:00
|
|
|
|
tiny_tls_list_guard_push(class_idx, tls, base);
|
2025-11-11 10:00:36 +09:00
|
|
|
|
tls_list_push(tls, base, class_idx);
|
2025-11-05 12:31:14 +09:00
|
|
|
|
tiny_debug_ring_record(TINY_RING_EVENT_FREE_LOCAL, (uint16_t)class_idx, ptr, 0);
|
|
|
|
|
|
HAK_STAT_FREE(class_idx);
|
|
|
|
|
|
return;
|
|
|
|
|
|
}
|
|
|
|
|
|
seq = atomic_load_explicit(&g_tls_param_seq[class_idx], memory_order_relaxed);
|
|
|
|
|
|
if (__builtin_expect(seq != g_tls_param_seen[class_idx], 0)) {
|
|
|
|
|
|
tiny_tls_refresh_params(class_idx, tls);
|
|
|
|
|
|
}
|
2025-11-10 16:48:20 +09:00
|
|
|
|
{
|
Phase E1-CORRECT: Fix USER/BASE pointer conversion bugs in slab_index_for calls
CRITICAL BUG FIX: Phase E1 introduced 1-byte headers for ALL size classes (C0-C7),
changing the pointer contract. However, many locations still called slab_index_for()
with USER pointers (storage+1) instead of BASE pointers (storage), causing off-by-one
slab index calculations that corrupted memory.
Root Cause:
- USER pointer = BASE + 1 (returned by malloc, points past header)
- BASE pointer = storage start (where 1-byte header is written)
- slab_index_for() expects BASE pointer for correct slab boundary calculations
- Passing USER pointer → wrong slab_idx → wrong metadata → freelist corruption
Impact Before Fix:
- bench_random_mixed crashes at ~14K iterations with SEGV
- Massive C7 alignment check failures (wrong slab classification)
- Memory corruption from writing to wrong slab freelists
Fixes Applied (8 locations):
1. core/hakmem_tiny_free.inc:137
- Added USER→BASE conversion before slab_index_for()
2. core/hakmem_tiny_ultra_simple.inc:148
- Added USER→BASE conversion before slab_index_for()
3. core/tiny_free_fast.inc.h:220
- Added USER→BASE conversion before slab_index_for()
4-5. core/tiny_free_magazine.inc.h:126,315
- Added USER→BASE conversion before slab_index_for() (2 locations)
6. core/box/free_local_box.c:14,22,62
- Added USER→BASE conversion before slab_index_for()
- Fixed delta calculation to use BASE instead of USER
- Fixed debug logging to use BASE instead of USER
7. core/hakmem_tiny.c:448,460,473 (tiny_debug_track_alloc_ret)
- Added USER→BASE conversion before slab_index_for() (2 calls)
- Fixed delta calculation to use BASE instead of USER
- This function is called on EVERY allocation in debug builds
Results After Fix:
✅ bench_random_mixed stable up to 66K iterations (~4.7x improvement)
✅ C7 alignment check failures eliminated (was: 100% failure rate)
✅ Front Gate "Unknown" classification dropped to 0% (was: 1.67%)
✅ No segfaults for workloads up to ~33K allocations
Remaining Issue:
❌ Segfault still occurs at iteration 66152 (allocs=33137, frees=33014)
- Different bug from USER/BASE conversion issues
- Likely capacity/boundary condition (further investigation needed)
Testing:
- bench_random_mixed_hakmem 1K-66K iterations: PASS
- bench_random_mixed_hakmem 67K+ iterations: FAIL (different bug)
- bench_fixed_size_hakmem 200K iterations: PASS
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 05:21:36 +09:00
|
|
|
|
// Phase E1-CORRECT: ALL classes (C0-C7) have 1-byte header
|
2025-12-03 12:29:31 +09:00
|
|
|
|
void* base = HAK_BASE_TO_RAW(ptr_user_to_base(HAK_USER_FROM_RAW(ptr), class_idx));
|
2025-11-11 10:00:36 +09:00
|
|
|
|
tiny_tls_list_guard_push(class_idx, tls, base);
|
|
|
|
|
|
tls_list_push(tls, base, class_idx);
|
2025-11-10 16:48:20 +09:00
|
|
|
|
}
|
2025-11-05 12:31:14 +09:00
|
|
|
|
if (tls_list_should_spill(tls)) {
|
|
|
|
|
|
tls_list_spill_excess(class_idx, tls);
|
|
|
|
|
|
}
|
|
|
|
|
|
tiny_debug_ring_record(TINY_RING_EVENT_FREE_LOCAL, (uint16_t)class_idx, ptr, 2);
|
|
|
|
|
|
HAK_STAT_FREE(class_idx);
|
|
|
|
|
|
return;
|
|
|
|
|
|
}
|
2025-12-01 16:37:59 +09:00
|
|
|
|
} else {
|
|
|
|
|
|
// Derive ss from slab (alignment) for TinySlab path
|
|
|
|
|
|
ss = (SuperSlab*)((uintptr_t)slab & ~(uintptr_t)(2*1024*1024 - 1));
|
|
|
|
|
|
}
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
#include "tiny_free_magazine.inc.h"
|
2025-12-03 12:29:31 +09:00
|
|
|
|
// ============================================================================
|
2025-11-05 12:31:14 +09:00
|
|
|
|
// Phase 6.23: SuperSlab Allocation Helpers
|
2025-12-03 12:29:31 +09:00
|
|
|
|
// ============================================================================
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
|
|
|
|
|
// Phase 6.24: Allocate from SuperSlab slab (lazy freelist + linear allocation)
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
#include "tiny_superslab_alloc.inc.h"
|
|
|
|
|
|
#include "tiny_superslab_free.inc.h"
|
2025-11-05 12:31:14 +09:00
|
|
|
|
|
|
|
|
|
|
void hak_tiny_free(void* ptr) {
|
2025-12-16 07:31:15 +09:00
|
|
|
|
#if HAKMEM_TINY_FREE_TRACE_COMPILED
|
Implement Phase 1: TLS SuperSlab Hint Box for Headerless performance
Design: Cache recently-used SuperSlab references in TLS to accelerate
ptr→SuperSlab resolution in Headerless mode free() path.
## Implementation
### New Box: core/box/tls_ss_hint_box.h
- Header-only Box (4-slot FIFO cache per thread)
- Functions: tls_ss_hint_init(), tls_ss_hint_update(), tls_ss_hint_lookup(), tls_ss_hint_clear()
- Memory overhead: 112 bytes per thread (negligible)
- Statistics API for debug builds (hit/miss counters)
### Integration Points
1. **Free path** (core/hakmem_tiny_free.inc):
- Lines 477-481: Fast path hint lookup before hak_super_lookup()
- Lines 550-555: Second lookup location (fallback path)
- Expected savings: 10-50 cycles → 2-5 cycles on cache hit
2. **Allocation path** (core/tiny_superslab_alloc.inc.h):
- Lines 115-122: Linear allocation return path
- Lines 179-186: Freelist allocation return path
- Cache update on successful allocation
3. **TLS variable** (core/hakmem_tiny_tls_state_box.inc):
- `__thread TlsSsHintCache g_tls_ss_hint = {0};`
### Build System
- **Build flag** (core/hakmem_build_flags.h):
- HAKMEM_TINY_SS_TLS_HINT (default: 0, disabled)
- Validation: requires HAKMEM_TINY_HEADERLESS=1
- **Makefile**:
- Removed old ss_tls_hint_box.o (conflicting implementation)
- Header-only design eliminates compiled object files
### Testing
- **Unit tests** (tests/test_tls_ss_hint.c):
- 6 test functions covering init, lookup, FIFO rotation, duplicates, clear, stats
- All tests PASSING
- **Build validation**:
- ✅ Compiles with hint disabled (default)
- ✅ Compiles with hint enabled (HAKMEM_TINY_SS_TLS_HINT=1)
### Documentation
- **Benchmark report** (docs/PHASE1_TLS_HINT_BENCHMARK.md):
- Implementation summary
- Build validation results
- Benchmark methodology (to be executed)
- Performance analysis framework
## Expected Performance
- **Hit rate**: 85-95% (single-threaded), 70-85% (multi-threaded)
- **Cycle savings**: 80-95% on cache hit (10-50 cycles → 2-5 cycles)
- **Target improvement**: 15-20% throughput increase vs Headerless baseline
- **Memory overhead**: 112 bytes per thread
## Box Theory
**Mission**: Cache hot SuperSlabs to avoid global registry lookup
**Boundary**: ptr → SuperSlab* or NULL (miss)
**Invariant**: hint.base ≤ ptr < hint.end → hit is valid
**Fallback**: Always safe to miss (triggers hak_super_lookup)
**Thread Safety**: TLS storage, no synchronization required
**Risk**: Low (read-only cache, fail-safe fallback, magic validation)
## Next Steps
1. Run full benchmark suite (sh8bench, cfrac, larson)
2. Measure actual hit rate with stats enabled
3. If performance target met (15-20% improvement), enable by default
4. Consider increasing cache slots if hit rate < 80%
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 18:06:24 +09:00
|
|
|
|
static _Atomic int g_tiny_free_trace = 0;
|
|
|
|
|
|
if (atomic_fetch_add_explicit(&g_tiny_free_trace, 1, memory_order_relaxed) < 128) {
|
|
|
|
|
|
HAK_TRACE("[hak_tiny_free_enter]\n");
|
|
|
|
|
|
}
|
2025-12-16 07:31:15 +09:00
|
|
|
|
#else
|
|
|
|
|
|
(void)0; // No-op when trace compiled out
|
|
|
|
|
|
#endif
|
2025-11-07 01:27:04 +09:00
|
|
|
|
// Track total tiny free calls (diagnostics)
|
|
|
|
|
|
extern _Atomic uint64_t g_hak_tiny_free_calls;
|
|
|
|
|
|
atomic_fetch_add_explicit(&g_hak_tiny_free_calls, 1, memory_order_relaxed);
|
2025-11-05 12:31:14 +09:00
|
|
|
|
if (!ptr || !g_tiny_initialized) return;
|
|
|
|
|
|
|
|
|
|
|
|
hak_tiny_stats_poll();
|
|
|
|
|
|
tiny_debug_ring_record(TINY_RING_EVENT_FREE_ENTER, 0, ptr, 0);
|
|
|
|
|
|
|
|
|
|
|
|
#ifdef HAKMEM_TINY_BENCH_SLL_ONLY
|
|
|
|
|
|
// Bench-only SLL-only free: push to TLS SLL for ≤64B when possible
|
|
|
|
|
|
{
|
|
|
|
|
|
int class_idx = -1;
|
|
|
|
|
|
if (g_use_superslab) {
|
2025-11-13 16:33:03 +09:00
|
|
|
|
// Resolve class_idx from per-slab metadata instead of ss->size_class
|
2025-11-05 12:31:14 +09:00
|
|
|
|
SuperSlab* ss = hak_super_lookup(ptr);
|
2025-11-13 16:33:03 +09:00
|
|
|
|
if (ss && ss->magic == SUPERSLAB_MAGIC) {
|
2025-12-03 12:29:31 +09:00
|
|
|
|
// void* base = ptr_user_to_base_blind(ptr); // FIX: Use ptr
|
|
|
|
|
|
int sidx = slab_index_for(ss, ptr);
|
2025-11-13 16:33:03 +09:00
|
|
|
|
if (sidx >= 0 && sidx < ss_slabs_capacity(ss)) {
|
|
|
|
|
|
TinySlabMeta* m = &ss->slabs[sidx];
|
|
|
|
|
|
if (m->class_idx < TINY_NUM_CLASSES) {
|
|
|
|
|
|
class_idx = (int)m->class_idx;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
2025-11-05 12:31:14 +09:00
|
|
|
|
}
|
|
|
|
|
|
if (class_idx < 0) {
|
|
|
|
|
|
TinySlab* slab = hak_tiny_owner_slab(ptr);
|
|
|
|
|
|
if (slab) class_idx = slab->class_idx;
|
|
|
|
|
|
}
|
|
|
|
|
|
if (class_idx >= 0 && class_idx <= 3) {
|
|
|
|
|
|
uint32_t sll_cap = sll_cap_for_class(class_idx, (uint32_t)TINY_TLS_MAG_CAP);
|
2025-11-20 07:32:30 +09:00
|
|
|
|
if ((int)g_tls_sll[class_idx].count < (int)sll_cap) {
|
2025-11-08 01:18:37 +09:00
|
|
|
|
// CORRUPTION DEBUG: Validate ptr and head before TLS SLL write
|
|
|
|
|
|
if (__builtin_expect(tiny_refill_failfast_level() >= 2, 0)) {
|
|
|
|
|
|
extern const size_t g_tiny_class_sizes[];
|
|
|
|
|
|
size_t blk = g_tiny_class_sizes[class_idx];
|
2025-12-01 16:37:59 +09:00
|
|
|
|
void* old_head = HAK_BASE_TO_RAW(g_tls_sll[class_idx].head);
|
2025-11-08 01:18:37 +09:00
|
|
|
|
|
|
|
|
|
|
// Validate ptr alignment
|
|
|
|
|
|
if (((uintptr_t)ptr % blk) != 0) {
|
|
|
|
|
|
fprintf(stderr, "[FAST_FREE_CORRUPT] ptr=%p misaligned (cls=%d blk=%zu offset=%zu)\n",
|
|
|
|
|
|
ptr, class_idx, blk, (uintptr_t)ptr % blk);
|
|
|
|
|
|
fprintf(stderr, "[FAST_FREE_CORRUPT] Attempting to push corrupted pointer to TLS SLL!\n");
|
|
|
|
|
|
abort();
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
// Validate old_head alignment if not NULL
|
|
|
|
|
|
if (old_head && ((uintptr_t)old_head % blk) != 0) {
|
|
|
|
|
|
fprintf(stderr, "[FAST_FREE_CORRUPT] TLS SLL head=%p already corrupted! (cls=%d blk=%zu offset=%zu)\n",
|
|
|
|
|
|
old_head, class_idx, blk, (uintptr_t)old_head % blk);
|
|
|
|
|
|
fprintf(stderr, "[FAST_FREE_CORRUPT] Corruption detected BEFORE fast free write (ptr=%p)\n", ptr);
|
|
|
|
|
|
abort();
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
fprintf(stderr, "[FAST_FREE] cls=%d ptr=%p old_head=%p count=%u\n",
|
2025-11-20 07:32:30 +09:00
|
|
|
|
class_idx, ptr, old_head, g_tls_sll[class_idx].count);
|
2025-11-08 01:18:37 +09:00
|
|
|
|
}
|
|
|
|
|
|
|
2025-12-01 16:37:59 +09:00
|
|
|
|
// Phase 10: Convert User -> Base for TLS SLL push
|
|
|
|
|
|
hak_base_ptr_t base_ptr = hak_user_to_base(HAK_USER_FROM_RAW(ptr));
|
|
|
|
|
|
if (tls_sll_push(class_idx, base_ptr, sll_cap)) {
|
2025-11-10 16:48:20 +09:00
|
|
|
|
return; // Success
|
|
|
|
|
|
}
|
|
|
|
|
|
// Fall through if push fails (SLL full or C7)
|
2025-11-05 12:31:14 +09:00
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
#endif
|
|
|
|
|
|
|
|
|
|
|
|
if (g_tiny_ultra) {
|
|
|
|
|
|
int class_idx = -1;
|
|
|
|
|
|
if (g_use_superslab) {
|
2025-11-13 16:33:03 +09:00
|
|
|
|
// Resolve class_idx from per-slab metadata instead of ss->size_class
|
2025-11-05 12:31:14 +09:00
|
|
|
|
SuperSlab* ss = hak_super_lookup(ptr);
|
2025-11-13 16:33:03 +09:00
|
|
|
|
if (ss && ss->magic == SUPERSLAB_MAGIC) {
|
2025-12-03 12:29:31 +09:00
|
|
|
|
// void* base = ptr_user_to_base_blind(ptr); // FIX: Use ptr
|
|
|
|
|
|
int sidx = slab_index_for(ss, ptr);
|
2025-11-13 16:33:03 +09:00
|
|
|
|
if (sidx >= 0 && sidx < ss_slabs_capacity(ss)) {
|
|
|
|
|
|
TinySlabMeta* m = &ss->slabs[sidx];
|
|
|
|
|
|
if (m->class_idx < TINY_NUM_CLASSES) {
|
|
|
|
|
|
class_idx = (int)m->class_idx;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
2025-11-05 12:31:14 +09:00
|
|
|
|
}
|
|
|
|
|
|
if (class_idx < 0) {
|
|
|
|
|
|
TinySlab* slab = hak_tiny_owner_slab(ptr);
|
|
|
|
|
|
if (slab) class_idx = slab->class_idx;
|
|
|
|
|
|
}
|
Phase E1-CORRECT: Fix USER/BASE pointer conversion bugs in slab_index_for calls
CRITICAL BUG FIX: Phase E1 introduced 1-byte headers for ALL size classes (C0-C7),
changing the pointer contract. However, many locations still called slab_index_for()
with USER pointers (storage+1) instead of BASE pointers (storage), causing off-by-one
slab index calculations that corrupted memory.
Root Cause:
- USER pointer = BASE + 1 (returned by malloc, points past header)
- BASE pointer = storage start (where 1-byte header is written)
- slab_index_for() expects BASE pointer for correct slab boundary calculations
- Passing USER pointer → wrong slab_idx → wrong metadata → freelist corruption
Impact Before Fix:
- bench_random_mixed crashes at ~14K iterations with SEGV
- Massive C7 alignment check failures (wrong slab classification)
- Memory corruption from writing to wrong slab freelists
Fixes Applied (8 locations):
1. core/hakmem_tiny_free.inc:137
- Added USER→BASE conversion before slab_index_for()
2. core/hakmem_tiny_ultra_simple.inc:148
- Added USER→BASE conversion before slab_index_for()
3. core/tiny_free_fast.inc.h:220
- Added USER→BASE conversion before slab_index_for()
4-5. core/tiny_free_magazine.inc.h:126,315
- Added USER→BASE conversion before slab_index_for() (2 locations)
6. core/box/free_local_box.c:14,22,62
- Added USER→BASE conversion before slab_index_for()
- Fixed delta calculation to use BASE instead of USER
- Fixed debug logging to use BASE instead of USER
7. core/hakmem_tiny.c:448,460,473 (tiny_debug_track_alloc_ret)
- Added USER→BASE conversion before slab_index_for() (2 calls)
- Fixed delta calculation to use BASE instead of USER
- This function is called on EVERY allocation in debug builds
Results After Fix:
✅ bench_random_mixed stable up to 66K iterations (~4.7x improvement)
✅ C7 alignment check failures eliminated (was: 100% failure rate)
✅ Front Gate "Unknown" classification dropped to 0% (was: 1.67%)
✅ No segfaults for workloads up to ~33K allocations
Remaining Issue:
❌ Segfault still occurs at iteration 66152 (allocs=33137, frees=33014)
- Different bug from USER/BASE conversion issues
- Likely capacity/boundary condition (further investigation needed)
Testing:
- bench_random_mixed_hakmem 1K-66K iterations: PASS
- bench_random_mixed_hakmem 67K+ iterations: FAIL (different bug)
- bench_fixed_size_hakmem 200K iterations: PASS
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 05:21:36 +09:00
|
|
|
|
// Phase E1-CORRECT: C7 now has headers, can use TLS SLL like other classes
|
|
|
|
|
|
if (class_idx >= 0) {
|
2025-11-05 12:31:14 +09:00
|
|
|
|
// Ultra free: push directly to TLS SLL without magazine init
|
|
|
|
|
|
int sll_cap = ultra_sll_cap_for_class(class_idx);
|
2025-11-20 07:32:30 +09:00
|
|
|
|
if ((int)g_tls_sll[class_idx].count < sll_cap) {
|
2025-11-08 01:18:37 +09:00
|
|
|
|
// CORRUPTION DEBUG: Validate ptr and head before TLS SLL write
|
|
|
|
|
|
if (__builtin_expect(tiny_refill_failfast_level() >= 2, 0)) {
|
|
|
|
|
|
extern const size_t g_tiny_class_sizes[];
|
|
|
|
|
|
size_t blk = g_tiny_class_sizes[class_idx];
|
2025-12-01 16:37:59 +09:00
|
|
|
|
void* old_head = HAK_BASE_TO_RAW(g_tls_sll[class_idx].head);
|
2025-11-08 01:18:37 +09:00
|
|
|
|
|
|
|
|
|
|
// Validate ptr alignment
|
|
|
|
|
|
if (((uintptr_t)ptr % blk) != 0) {
|
|
|
|
|
|
fprintf(stderr, "[ULTRA_FREE_CORRUPT] ptr=%p misaligned (cls=%d blk=%zu offset=%zu)\n",
|
|
|
|
|
|
ptr, class_idx, blk, (uintptr_t)ptr % blk);
|
|
|
|
|
|
fprintf(stderr, "[ULTRA_FREE_CORRUPT] Attempting to push corrupted pointer to TLS SLL!\n");
|
|
|
|
|
|
abort();
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
// Validate old_head alignment if not NULL
|
|
|
|
|
|
if (old_head && ((uintptr_t)old_head % blk) != 0) {
|
|
|
|
|
|
fprintf(stderr, "[ULTRA_FREE_CORRUPT] TLS SLL head=%p already corrupted! (cls=%d blk=%zu offset=%zu)\n",
|
|
|
|
|
|
old_head, class_idx, blk, (uintptr_t)old_head % blk);
|
|
|
|
|
|
fprintf(stderr, "[ULTRA_FREE_CORRUPT] Corruption detected BEFORE ultra free write (ptr=%p)\n", ptr);
|
|
|
|
|
|
abort();
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
fprintf(stderr, "[ULTRA_FREE] cls=%d ptr=%p old_head=%p count=%u\n",
|
2025-11-20 07:32:30 +09:00
|
|
|
|
class_idx, ptr, old_head, g_tls_sll[class_idx].count);
|
2025-11-08 01:18:37 +09:00
|
|
|
|
}
|
|
|
|
|
|
|
2025-11-10 16:48:20 +09:00
|
|
|
|
// Use Box TLS-SLL API (C7-safe push)
|
|
|
|
|
|
// Note: C7 already rejected at line 334
|
|
|
|
|
|
{
|
2025-12-01 16:37:59 +09:00
|
|
|
|
// Phase 10: Convert User -> Base for TLS SLL push
|
|
|
|
|
|
hak_base_ptr_t base_ptr = hak_user_to_base(HAK_USER_FROM_RAW(ptr));
|
|
|
|
|
|
if (tls_sll_push(class_idx, base_ptr, (uint32_t)sll_cap)) {
|
2025-11-10 16:48:20 +09:00
|
|
|
|
// CORRUPTION DEBUG: Verify write succeeded
|
|
|
|
|
|
if (__builtin_expect(tiny_refill_failfast_level() >= 2, 0)) {
|
2025-12-01 16:37:59 +09:00
|
|
|
|
void* base = HAK_BASE_TO_RAW(base_ptr);
|
Phase E1-CORRECT: Fix USER/BASE pointer conversion bugs in slab_index_for calls
CRITICAL BUG FIX: Phase E1 introduced 1-byte headers for ALL size classes (C0-C7),
changing the pointer contract. However, many locations still called slab_index_for()
with USER pointers (storage+1) instead of BASE pointers (storage), causing off-by-one
slab index calculations that corrupted memory.
Root Cause:
- USER pointer = BASE + 1 (returned by malloc, points past header)
- BASE pointer = storage start (where 1-byte header is written)
- slab_index_for() expects BASE pointer for correct slab boundary calculations
- Passing USER pointer → wrong slab_idx → wrong metadata → freelist corruption
Impact Before Fix:
- bench_random_mixed crashes at ~14K iterations with SEGV
- Massive C7 alignment check failures (wrong slab classification)
- Memory corruption from writing to wrong slab freelists
Fixes Applied (8 locations):
1. core/hakmem_tiny_free.inc:137
- Added USER→BASE conversion before slab_index_for()
2. core/hakmem_tiny_ultra_simple.inc:148
- Added USER→BASE conversion before slab_index_for()
3. core/tiny_free_fast.inc.h:220
- Added USER→BASE conversion before slab_index_for()
4-5. core/tiny_free_magazine.inc.h:126,315
- Added USER→BASE conversion before slab_index_for() (2 locations)
6. core/box/free_local_box.c:14,22,62
- Added USER→BASE conversion before slab_index_for()
- Fixed delta calculation to use BASE instead of USER
- Fixed debug logging to use BASE instead of USER
7. core/hakmem_tiny.c:448,460,473 (tiny_debug_track_alloc_ret)
- Added USER→BASE conversion before slab_index_for() (2 calls)
- Fixed delta calculation to use BASE instead of USER
- This function is called on EVERY allocation in debug builds
Results After Fix:
✅ bench_random_mixed stable up to 66K iterations (~4.7x improvement)
✅ C7 alignment check failures eliminated (was: 100% failure rate)
✅ Front Gate "Unknown" classification dropped to 0% (was: 1.67%)
✅ No segfaults for workloads up to ~33K allocations
Remaining Issue:
❌ Segfault still occurs at iteration 66152 (allocs=33137, frees=33014)
- Different bug from USER/BASE conversion issues
- Likely capacity/boundary condition (further investigation needed)
Testing:
- bench_random_mixed_hakmem 1K-66K iterations: PASS
- bench_random_mixed_hakmem 67K+ iterations: FAIL (different bug)
- bench_fixed_size_hakmem 200K iterations: PASS
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 05:21:36 +09:00
|
|
|
|
void* readback = tiny_next_read(class_idx, base); // Phase E1-CORRECT: Box API
|
2025-11-10 16:48:20 +09:00
|
|
|
|
(void)readback;
|
2025-12-01 16:37:59 +09:00
|
|
|
|
void* new_head = HAK_BASE_TO_RAW(g_tls_sll[class_idx].head);
|
2025-11-10 16:48:20 +09:00
|
|
|
|
if (new_head != base) {
|
|
|
|
|
|
fprintf(stderr, "[ULTRA_FREE_CORRUPT] Write verification failed! base=%p new_head=%p\n",
|
|
|
|
|
|
base, new_head);
|
|
|
|
|
|
abort();
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
return; // Success
|
2025-11-08 01:18:37 +09:00
|
|
|
|
}
|
|
|
|
|
|
}
|
2025-11-10 16:48:20 +09:00
|
|
|
|
// Fall through if push fails (SLL full)
|
2025-11-05 12:31:14 +09:00
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
// Fallback to existing path if class resolution fails
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
SuperSlab* fast_ss = NULL;
|
|
|
|
|
|
TinySlab* fast_slab = NULL;
|
|
|
|
|
|
int fast_class_idx = -1;
|
|
|
|
|
|
if (g_use_superslab) {
|
Implement Phase 1: TLS SuperSlab Hint Box for Headerless performance
Design: Cache recently-used SuperSlab references in TLS to accelerate
ptr→SuperSlab resolution in Headerless mode free() path.
## Implementation
### New Box: core/box/tls_ss_hint_box.h
- Header-only Box (4-slot FIFO cache per thread)
- Functions: tls_ss_hint_init(), tls_ss_hint_update(), tls_ss_hint_lookup(), tls_ss_hint_clear()
- Memory overhead: 112 bytes per thread (negligible)
- Statistics API for debug builds (hit/miss counters)
### Integration Points
1. **Free path** (core/hakmem_tiny_free.inc):
- Lines 477-481: Fast path hint lookup before hak_super_lookup()
- Lines 550-555: Second lookup location (fallback path)
- Expected savings: 10-50 cycles → 2-5 cycles on cache hit
2. **Allocation path** (core/tiny_superslab_alloc.inc.h):
- Lines 115-122: Linear allocation return path
- Lines 179-186: Freelist allocation return path
- Cache update on successful allocation
3. **TLS variable** (core/hakmem_tiny_tls_state_box.inc):
- `__thread TlsSsHintCache g_tls_ss_hint = {0};`
### Build System
- **Build flag** (core/hakmem_build_flags.h):
- HAKMEM_TINY_SS_TLS_HINT (default: 0, disabled)
- Validation: requires HAKMEM_TINY_HEADERLESS=1
- **Makefile**:
- Removed old ss_tls_hint_box.o (conflicting implementation)
- Header-only design eliminates compiled object files
### Testing
- **Unit tests** (tests/test_tls_ss_hint.c):
- 6 test functions covering init, lookup, FIFO rotation, duplicates, clear, stats
- All tests PASSING
- **Build validation**:
- ✅ Compiles with hint disabled (default)
- ✅ Compiles with hint enabled (HAKMEM_TINY_SS_TLS_HINT=1)
### Documentation
- **Benchmark report** (docs/PHASE1_TLS_HINT_BENCHMARK.md):
- Implementation summary
- Build validation results
- Benchmark methodology (to be executed)
- Performance analysis framework
## Expected Performance
- **Hit rate**: 85-95% (single-threaded), 70-85% (multi-threaded)
- **Cycle savings**: 80-95% on cache hit (10-50 cycles → 2-5 cycles)
- **Target improvement**: 15-20% throughput increase vs Headerless baseline
- **Memory overhead**: 112 bytes per thread
## Box Theory
**Mission**: Cache hot SuperSlabs to avoid global registry lookup
**Boundary**: ptr → SuperSlab* or NULL (miss)
**Invariant**: hint.base ≤ ptr < hint.end → hit is valid
**Fallback**: Always safe to miss (triggers hak_super_lookup)
**Thread Safety**: TLS storage, no synchronization required
**Risk**: Low (read-only cache, fail-safe fallback, magic validation)
## Next Steps
1. Run full benchmark suite (sh8bench, cfrac, larson)
2. Measure actual hit rate with stats enabled
3. If performance target met (15-20% improvement), enable by default
4. Consider increasing cache slots if hit rate < 80%
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 18:06:24 +09:00
|
|
|
|
// Phase 1: Try TLS hint cache first (fast path for Headerless mode)
|
|
|
|
|
|
#if HAKMEM_TINY_SS_TLS_HINT
|
|
|
|
|
|
if (!tls_ss_hint_lookup(ptr, &fast_ss)) {
|
|
|
|
|
|
#endif
|
|
|
|
|
|
fast_ss = hak_super_lookup(ptr);
|
|
|
|
|
|
#if HAKMEM_TINY_SS_TLS_HINT
|
|
|
|
|
|
}
|
|
|
|
|
|
#endif
|
2025-11-05 12:31:14 +09:00
|
|
|
|
if (fast_ss && fast_ss->magic == SUPERSLAB_MAGIC) {
|
2025-12-03 12:29:31 +09:00
|
|
|
|
// void* base = ptr_user_to_base_blind(ptr); // FIX: Use ptr
|
|
|
|
|
|
int sidx = slab_index_for(fast_ss, ptr);
|
2025-11-13 16:33:03 +09:00
|
|
|
|
if (sidx >= 0 && sidx < ss_slabs_capacity(fast_ss)) {
|
|
|
|
|
|
TinySlabMeta* m = &fast_ss->slabs[sidx];
|
|
|
|
|
|
if (m->class_idx < TINY_NUM_CLASSES) {
|
|
|
|
|
|
fast_class_idx = (int)m->class_idx;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
if (fast_class_idx < 0) {
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
fast_ss = NULL;
|
|
|
|
|
|
}
|
2025-11-05 12:31:14 +09:00
|
|
|
|
} else {
|
|
|
|
|
|
fast_ss = NULL;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
if (fast_class_idx < 0) {
|
|
|
|
|
|
fast_slab = hak_tiny_owner_slab(ptr);
|
|
|
|
|
|
if (fast_slab) fast_class_idx = fast_slab->class_idx;
|
|
|
|
|
|
}
|
|
|
|
|
|
// Safety: detect class mismatch (SS vs TinySlab) early
|
|
|
|
|
|
if (__builtin_expect(g_tiny_safe_free && fast_class_idx >= 0, 0)) {
|
|
|
|
|
|
int ss_cls = -1, ts_cls = -1;
|
|
|
|
|
|
SuperSlab* chk_ss = fast_ss ? fast_ss : (g_use_superslab ? hak_super_lookup(ptr) : NULL);
|
2025-11-13 16:33:03 +09:00
|
|
|
|
if (chk_ss && chk_ss->magic == SUPERSLAB_MAGIC) {
|
2025-12-03 12:29:31 +09:00
|
|
|
|
// void* base = ptr_user_to_base_blind(ptr); // FIX: Use ptr
|
|
|
|
|
|
int sidx = slab_index_for(chk_ss, ptr);
|
2025-11-13 16:33:03 +09:00
|
|
|
|
if (sidx >= 0 && sidx < ss_slabs_capacity(chk_ss)) {
|
|
|
|
|
|
TinySlabMeta* m = &chk_ss->slabs[sidx];
|
|
|
|
|
|
if (m->class_idx < TINY_NUM_CLASSES) {
|
|
|
|
|
|
ss_cls = (int)m->class_idx;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
2025-11-05 12:31:14 +09:00
|
|
|
|
TinySlab* chk_slab = fast_slab ? fast_slab : hak_tiny_owner_slab(ptr);
|
|
|
|
|
|
if (chk_slab) ts_cls = chk_slab->class_idx;
|
|
|
|
|
|
if (ss_cls >= 0 && ts_cls >= 0 && ss_cls != ts_cls) {
|
|
|
|
|
|
uintptr_t packed = ((uintptr_t)(uint16_t)ss_cls << 16) | (uint16_t)ts_cls;
|
|
|
|
|
|
tiny_debug_ring_record(TINY_RING_EVENT_REMOTE_INVALID, (uint16_t)fast_class_idx, ptr, packed);
|
|
|
|
|
|
if (g_tiny_safe_free_strict) { raise(SIGUSR2); return; }
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
if (fast_class_idx >= 0) {
|
|
|
|
|
|
tiny_debug_ring_record(TINY_RING_EVENT_FREE_ENTER, (uint16_t)fast_class_idx, ptr, 1);
|
|
|
|
|
|
}
|
|
|
|
|
|
if (fast_class_idx >= 0 && g_fast_enable && g_fast_cap[fast_class_idx] != 0) {
|
Phase E1-CORRECT: Fix USER/BASE pointer conversion bugs in slab_index_for calls
CRITICAL BUG FIX: Phase E1 introduced 1-byte headers for ALL size classes (C0-C7),
changing the pointer contract. However, many locations still called slab_index_for()
with USER pointers (storage+1) instead of BASE pointers (storage), causing off-by-one
slab index calculations that corrupted memory.
Root Cause:
- USER pointer = BASE + 1 (returned by malloc, points past header)
- BASE pointer = storage start (where 1-byte header is written)
- slab_index_for() expects BASE pointer for correct slab boundary calculations
- Passing USER pointer → wrong slab_idx → wrong metadata → freelist corruption
Impact Before Fix:
- bench_random_mixed crashes at ~14K iterations with SEGV
- Massive C7 alignment check failures (wrong slab classification)
- Memory corruption from writing to wrong slab freelists
Fixes Applied (8 locations):
1. core/hakmem_tiny_free.inc:137
- Added USER→BASE conversion before slab_index_for()
2. core/hakmem_tiny_ultra_simple.inc:148
- Added USER→BASE conversion before slab_index_for()
3. core/tiny_free_fast.inc.h:220
- Added USER→BASE conversion before slab_index_for()
4-5. core/tiny_free_magazine.inc.h:126,315
- Added USER→BASE conversion before slab_index_for() (2 locations)
6. core/box/free_local_box.c:14,22,62
- Added USER→BASE conversion before slab_index_for()
- Fixed delta calculation to use BASE instead of USER
- Fixed debug logging to use BASE instead of USER
7. core/hakmem_tiny.c:448,460,473 (tiny_debug_track_alloc_ret)
- Added USER→BASE conversion before slab_index_for() (2 calls)
- Fixed delta calculation to use BASE instead of USER
- This function is called on EVERY allocation in debug builds
Results After Fix:
✅ bench_random_mixed stable up to 66K iterations (~4.7x improvement)
✅ C7 alignment check failures eliminated (was: 100% failure rate)
✅ Front Gate "Unknown" classification dropped to 0% (was: 1.67%)
✅ No segfaults for workloads up to ~33K allocations
Remaining Issue:
❌ Segfault still occurs at iteration 66152 (allocs=33137, frees=33014)
- Different bug from USER/BASE conversion issues
- Likely capacity/boundary condition (further investigation needed)
Testing:
- bench_random_mixed_hakmem 1K-66K iterations: PASS
- bench_random_mixed_hakmem 67K+ iterations: FAIL (different bug)
- bench_fixed_size_hakmem 200K iterations: PASS
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-13 05:21:36 +09:00
|
|
|
|
// Phase E1-CORRECT: ALL classes (C0-C7) have 1-byte header
|
2025-12-04 11:05:06 +09:00
|
|
|
|
hak_base_ptr_t base2 = ptr_user_to_base(HAK_USER_FROM_RAW(ptr), fast_class_idx);
|
Front-Direct implementation: SS→FC direct refill + SLL complete bypass
## Summary
Implemented Front-Direct architecture with complete SLL bypass:
- Direct SuperSlab → FastCache refill (1-hop, bypasses SLL)
- SLL-free allocation/free paths when Front-Direct enabled
- Legacy path sealing (SLL inline opt-in, SFC cascade ENV-only)
## New Modules
- core/refill/ss_refill_fc.h (236 lines): Standard SS→FC refill entry point
- Remote drain → Freelist → Carve priority
- Header restoration for C1-C6 (NOT C0/C7)
- ENV: HAKMEM_TINY_P0_DRAIN_THRESH, HAKMEM_TINY_P0_NO_DRAIN
- core/front/fast_cache.h: FastCache (L1) type definition
- core/front/quick_slot.h: QuickSlot (L0) type definition
## Allocation Path (core/tiny_alloc_fast.inc.h)
- Added s_front_direct_alloc TLS flag (lazy ENV check)
- SLL pop guarded by: g_tls_sll_enable && !s_front_direct_alloc
- Refill dispatch:
- Front-Direct: ss_refill_fc_fill() → fastcache_pop() (1-hop)
- Legacy: sll_refill_batch_from_ss() → SLL → FC (2-hop, A/B only)
- SLL inline pop sealed (requires HAKMEM_TINY_INLINE_SLL=1 opt-in)
## Free Path (core/hakmem_tiny_free.inc, core/hakmem_tiny_fastcache.inc.h)
- FC priority: Try fastcache_push() first (same-thread free)
- tiny_fast_push() bypass: Returns 0 when s_front_direct_free || !g_tls_sll_enable
- Fallback: Magazine/slow path (safe, bypasses SLL)
## Legacy Sealing
- SFC cascade: Default OFF (ENV-only via HAKMEM_TINY_SFC_CASCADE=1)
- Deleted: core/hakmem_tiny_free.inc.bak, core/pool_refill_legacy.c.bak
- Documentation: ss_refill_fc_fill() promoted as CANONICAL refill entry
## ENV Controls
- HAKMEM_TINY_FRONT_DIRECT=1: Enable Front-Direct (SS→FC direct)
- HAKMEM_TINY_P0_DIRECT_FC_ALL=1: Same as above (alt name)
- HAKMEM_TINY_REFILL_BATCH=1: Enable batch refill (also enables Front-Direct)
- HAKMEM_TINY_SFC_CASCADE=1: Enable SFC cascade (default OFF)
- HAKMEM_TINY_INLINE_SLL=1: Enable inline SLL pop (default OFF, requires AGGRESSIVE_INLINE)
## Benchmarks (Front-Direct Enabled)
```bash
ENV: HAKMEM_BENCH_FAST_FRONT=1 HAKMEM_TINY_FRONT_DIRECT=1
HAKMEM_TINY_REFILL_BATCH=1 HAKMEM_TINY_P0_DIRECT_FC_ALL=1
HAKMEM_TINY_REFILL_COUNT_HOT=256 HAKMEM_TINY_REFILL_COUNT_MID=96
HAKMEM_TINY_BUMP_CHUNK=256
bench_random_mixed (16-1040B random, 200K iter):
256 slots: 1.44M ops/s (STABLE, 0 SEGV)
128 slots: 1.44M ops/s (STABLE, 0 SEGV)
bench_fixed_size (fixed size, 200K iter):
256B: 4.06M ops/s (has debug logs, expected >10M without logs)
128B: Similar (debug logs affect)
```
## Verification
- TRACE_RING test (10K iter): **0 SLL events** detected ✅
- Complete SLL bypass confirmed when Front-Direct=1
- Stable execution: 200K iterations × multiple sizes, 0 SEGV
## Next Steps
- Disable debug logs in hak_alloc_api.inc.h (call_num 14250-14280 range)
- Re-benchmark with clean Release build (target: 10-15M ops/s)
- 128/256B shortcut path optimization (FC hit rate improvement)
Co-Authored-By: ChatGPT <chatgpt@openai.com>
Suggested-By: ultrathink
2025-11-14 05:41:49 +09:00
|
|
|
|
// PRIORITY 1: Try FastCache first (bypasses SLL when Front-Direct)
|
|
|
|
|
|
int pushed = 0;
|
2025-11-29 17:12:15 +09:00
|
|
|
|
// Phase 7-Step5: Use config macro for dead code elimination in PGO mode
|
|
|
|
|
|
if (__builtin_expect(TINY_FRONT_FASTCACHE_ENABLED && fast_class_idx <= 3, 1)) {
|
Front-Direct implementation: SS→FC direct refill + SLL complete bypass
## Summary
Implemented Front-Direct architecture with complete SLL bypass:
- Direct SuperSlab → FastCache refill (1-hop, bypasses SLL)
- SLL-free allocation/free paths when Front-Direct enabled
- Legacy path sealing (SLL inline opt-in, SFC cascade ENV-only)
## New Modules
- core/refill/ss_refill_fc.h (236 lines): Standard SS→FC refill entry point
- Remote drain → Freelist → Carve priority
- Header restoration for C1-C6 (NOT C0/C7)
- ENV: HAKMEM_TINY_P0_DRAIN_THRESH, HAKMEM_TINY_P0_NO_DRAIN
- core/front/fast_cache.h: FastCache (L1) type definition
- core/front/quick_slot.h: QuickSlot (L0) type definition
## Allocation Path (core/tiny_alloc_fast.inc.h)
- Added s_front_direct_alloc TLS flag (lazy ENV check)
- SLL pop guarded by: g_tls_sll_enable && !s_front_direct_alloc
- Refill dispatch:
- Front-Direct: ss_refill_fc_fill() → fastcache_pop() (1-hop)
- Legacy: sll_refill_batch_from_ss() → SLL → FC (2-hop, A/B only)
- SLL inline pop sealed (requires HAKMEM_TINY_INLINE_SLL=1 opt-in)
## Free Path (core/hakmem_tiny_free.inc, core/hakmem_tiny_fastcache.inc.h)
- FC priority: Try fastcache_push() first (same-thread free)
- tiny_fast_push() bypass: Returns 0 when s_front_direct_free || !g_tls_sll_enable
- Fallback: Magazine/slow path (safe, bypasses SLL)
## Legacy Sealing
- SFC cascade: Default OFF (ENV-only via HAKMEM_TINY_SFC_CASCADE=1)
- Deleted: core/hakmem_tiny_free.inc.bak, core/pool_refill_legacy.c.bak
- Documentation: ss_refill_fc_fill() promoted as CANONICAL refill entry
## ENV Controls
- HAKMEM_TINY_FRONT_DIRECT=1: Enable Front-Direct (SS→FC direct)
- HAKMEM_TINY_P0_DIRECT_FC_ALL=1: Same as above (alt name)
- HAKMEM_TINY_REFILL_BATCH=1: Enable batch refill (also enables Front-Direct)
- HAKMEM_TINY_SFC_CASCADE=1: Enable SFC cascade (default OFF)
- HAKMEM_TINY_INLINE_SLL=1: Enable inline SLL pop (default OFF, requires AGGRESSIVE_INLINE)
## Benchmarks (Front-Direct Enabled)
```bash
ENV: HAKMEM_BENCH_FAST_FRONT=1 HAKMEM_TINY_FRONT_DIRECT=1
HAKMEM_TINY_REFILL_BATCH=1 HAKMEM_TINY_P0_DIRECT_FC_ALL=1
HAKMEM_TINY_REFILL_COUNT_HOT=256 HAKMEM_TINY_REFILL_COUNT_MID=96
HAKMEM_TINY_BUMP_CHUNK=256
bench_random_mixed (16-1040B random, 200K iter):
256 slots: 1.44M ops/s (STABLE, 0 SEGV)
128 slots: 1.44M ops/s (STABLE, 0 SEGV)
bench_fixed_size (fixed size, 200K iter):
256B: 4.06M ops/s (has debug logs, expected >10M without logs)
128B: Similar (debug logs affect)
```
## Verification
- TRACE_RING test (10K iter): **0 SLL events** detected ✅
- Complete SLL bypass confirmed when Front-Direct=1
- Stable execution: 200K iterations × multiple sizes, 0 SEGV
## Next Steps
- Disable debug logs in hak_alloc_api.inc.h (call_num 14250-14280 range)
- Re-benchmark with clean Release build (target: 10-15M ops/s)
- 128/256B shortcut path optimization (FC hit rate improvement)
Co-Authored-By: ChatGPT <chatgpt@openai.com>
Suggested-By: ultrathink
2025-11-14 05:41:49 +09:00
|
|
|
|
pushed = fastcache_push(fast_class_idx, base2);
|
|
|
|
|
|
} else {
|
|
|
|
|
|
pushed = tiny_fast_push(fast_class_idx, base2);
|
|
|
|
|
|
}
|
|
|
|
|
|
if (pushed) {
|
2025-11-05 12:31:14 +09:00
|
|
|
|
tiny_debug_ring_record(TINY_RING_EVENT_FREE_FAST, (uint16_t)fast_class_idx, ptr, 0);
|
|
|
|
|
|
HAK_STAT_FREE(fast_class_idx);
|
|
|
|
|
|
return;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
// SuperSlab detection: prefer fast mask-based check when available
|
|
|
|
|
|
SuperSlab* ss = fast_ss;
|
|
|
|
|
|
if (!ss && g_use_superslab) {
|
Implement Phase 1: TLS SuperSlab Hint Box for Headerless performance
Design: Cache recently-used SuperSlab references in TLS to accelerate
ptr→SuperSlab resolution in Headerless mode free() path.
## Implementation
### New Box: core/box/tls_ss_hint_box.h
- Header-only Box (4-slot FIFO cache per thread)
- Functions: tls_ss_hint_init(), tls_ss_hint_update(), tls_ss_hint_lookup(), tls_ss_hint_clear()
- Memory overhead: 112 bytes per thread (negligible)
- Statistics API for debug builds (hit/miss counters)
### Integration Points
1. **Free path** (core/hakmem_tiny_free.inc):
- Lines 477-481: Fast path hint lookup before hak_super_lookup()
- Lines 550-555: Second lookup location (fallback path)
- Expected savings: 10-50 cycles → 2-5 cycles on cache hit
2. **Allocation path** (core/tiny_superslab_alloc.inc.h):
- Lines 115-122: Linear allocation return path
- Lines 179-186: Freelist allocation return path
- Cache update on successful allocation
3. **TLS variable** (core/hakmem_tiny_tls_state_box.inc):
- `__thread TlsSsHintCache g_tls_ss_hint = {0};`
### Build System
- **Build flag** (core/hakmem_build_flags.h):
- HAKMEM_TINY_SS_TLS_HINT (default: 0, disabled)
- Validation: requires HAKMEM_TINY_HEADERLESS=1
- **Makefile**:
- Removed old ss_tls_hint_box.o (conflicting implementation)
- Header-only design eliminates compiled object files
### Testing
- **Unit tests** (tests/test_tls_ss_hint.c):
- 6 test functions covering init, lookup, FIFO rotation, duplicates, clear, stats
- All tests PASSING
- **Build validation**:
- ✅ Compiles with hint disabled (default)
- ✅ Compiles with hint enabled (HAKMEM_TINY_SS_TLS_HINT=1)
### Documentation
- **Benchmark report** (docs/PHASE1_TLS_HINT_BENCHMARK.md):
- Implementation summary
- Build validation results
- Benchmark methodology (to be executed)
- Performance analysis framework
## Expected Performance
- **Hit rate**: 85-95% (single-threaded), 70-85% (multi-threaded)
- **Cycle savings**: 80-95% on cache hit (10-50 cycles → 2-5 cycles)
- **Target improvement**: 15-20% throughput increase vs Headerless baseline
- **Memory overhead**: 112 bytes per thread
## Box Theory
**Mission**: Cache hot SuperSlabs to avoid global registry lookup
**Boundary**: ptr → SuperSlab* or NULL (miss)
**Invariant**: hint.base ≤ ptr < hint.end → hit is valid
**Fallback**: Always safe to miss (triggers hak_super_lookup)
**Thread Safety**: TLS storage, no synchronization required
**Risk**: Low (read-only cache, fail-safe fallback, magic validation)
## Next Steps
1. Run full benchmark suite (sh8bench, cfrac, larson)
2. Measure actual hit rate with stats enabled
3. If performance target met (15-20% improvement), enable by default
4. Consider increasing cache slots if hit rate < 80%
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 18:06:24 +09:00
|
|
|
|
// Phase 1: Try TLS hint cache first (fast path for Headerless mode)
|
|
|
|
|
|
#if HAKMEM_TINY_SS_TLS_HINT
|
|
|
|
|
|
if (!tls_ss_hint_lookup(ptr, &ss)) {
|
|
|
|
|
|
#endif
|
|
|
|
|
|
ss = hak_super_lookup(ptr);
|
|
|
|
|
|
#if HAKMEM_TINY_SS_TLS_HINT
|
|
|
|
|
|
}
|
|
|
|
|
|
#endif
|
2025-11-05 12:31:14 +09:00
|
|
|
|
if (!(ss && ss->magic == SUPERSLAB_MAGIC)) {
|
|
|
|
|
|
ss = NULL;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
if (ss && ss->magic == SUPERSLAB_MAGIC) {
|
2025-11-13 16:33:03 +09:00
|
|
|
|
// Derive class from per-slab meta
|
|
|
|
|
|
int cls = -1;
|
2025-12-03 12:29:31 +09:00
|
|
|
|
// void* base = ptr_user_to_base_blind(ptr); // FIX: Use ptr
|
|
|
|
|
|
int sidx = slab_index_for(ss, ptr);
|
2025-11-13 16:33:03 +09:00
|
|
|
|
if (sidx >= 0 && sidx < ss_slabs_capacity(ss)) {
|
|
|
|
|
|
TinySlabMeta* m = &ss->slabs[sidx];
|
|
|
|
|
|
if (m->class_idx < TINY_NUM_CLASSES) {
|
|
|
|
|
|
cls = (int)m->class_idx;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
if (cls < 0) {
|
|
|
|
|
|
if (g_tiny_safe_free_strict) { raise(SIGUSR2); }
|
Phase 1: Box Theory refactoring + include reduction
Phase 1-1: Split hakmem_tiny_free.inc (1,711 → 452 lines, -73%)
- Created tiny_free_magazine.inc.h (413 lines) - Magazine layer
- Created tiny_superslab_alloc.inc.h (394 lines) - SuperSlab alloc
- Created tiny_superslab_free.inc.h (305 lines) - SuperSlab free
Phase 1-2++: Refactor hakmem_pool.c (1,481 → 907 lines, -38.8%)
- Created pool_tls_types.inc.h (32 lines) - TLS structures
- Created pool_mf2_types.inc.h (266 lines) - MF2 data structures
- Created pool_mf2_helpers.inc.h (158 lines) - Helper functions
- Created pool_mf2_adoption.inc.h (129 lines) - Adoption logic
Phase 1-3: Reduce hakmem_tiny.c includes (60 → 46, -23.3%)
- Created tiny_system.h - System headers umbrella (stdio, stdlib, etc.)
- Created tiny_api.h - API headers umbrella (stats, query, rss, registry)
Performance: 4.19M ops/s maintained (±0% regression)
Verified: Larson benchmark 2×8×128×1024 = 4,192,128 ops/s
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 21:54:12 +09:00
|
|
|
|
return;
|
|
|
|
|
|
}
|
2025-11-05 12:31:14 +09:00
|
|
|
|
hak_tiny_free_superslab(ptr, ss);
|
2025-11-13 16:33:03 +09:00
|
|
|
|
HAK_STAT_FREE(cls);
|
2025-11-05 12:31:14 +09:00
|
|
|
|
return;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
// Fallback to TinySlab only when SuperSlab is not in use
|
|
|
|
|
|
TinySlab* slab = fast_slab;
|
|
|
|
|
|
if (!slab) slab = hak_tiny_owner_slab(ptr);
|
|
|
|
|
|
if (!slab) return; // Not managed by Tiny Pool
|
|
|
|
|
|
if (__builtin_expect(g_use_superslab, 0)) {
|
|
|
|
|
|
// In SS mode, a pointer that resolves only to TinySlab is suspicious → treat as invalid free
|
|
|
|
|
|
tiny_debug_ring_record(TINY_RING_EVENT_REMOTE_INVALID, 0xEE, ptr, 0xF1u);
|
|
|
|
|
|
if (g_tiny_safe_free_strict) { raise(SIGUSR2); return; }
|
|
|
|
|
|
return;
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
hak_tiny_free_with_slab(ptr, slab);
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2025-12-03 12:29:31 +09:00
|
|
|
|
// ============================================================================
|
2025-11-05 12:31:14 +09:00
|
|
|
|
// EXTRACTED TO hakmem_tiny_query.c (Phase 2B-1)
|
2025-12-03 12:29:31 +09:00
|
|
|
|
// ============================================================================
|
2025-11-05 12:31:14 +09:00
|
|
|
|
// EXTRACTED: int hak_tiny_is_managed(void* ptr) {
|
|
|
|
|
|
// EXTRACTED: if (!ptr || !g_tiny_initialized) return 0;
|
|
|
|
|
|
// EXTRACTED: // Phase 6.12.1: O(1) slab lookup via registry/list
|
|
|
|
|
|
// EXTRACTED: return hak_tiny_owner_slab(ptr) != NULL || hak_super_lookup(ptr) != NULL;
|
|
|
|
|
|
// EXTRACTED: }
|
|
|
|
|
|
|
|
|
|
|
|
// Phase 7.6: Check if pointer is managed by Tiny Pool (TinySlab OR SuperSlab)
|
|
|
|
|
|
// EXTRACTED: int hak_tiny_is_managed_superslab(void* ptr) {
|
|
|
|
|
|
// EXTRACTED: if (!ptr || !g_tiny_initialized) return 0;
|
|
|
|
|
|
// EXTRACTED:
|
|
|
|
|
|
// EXTRACTED: // Safety: Only check if g_use_superslab is enabled
|
|
|
|
|
|
// EXTRACTED: if (g_use_superslab) {
|
|
|
|
|
|
// EXTRACTED: SuperSlab* ss = hak_super_lookup(ptr);
|
|
|
|
|
|
// EXTRACTED: // Phase 8.2 optimization: Use alignment check instead of mincore()
|
|
|
|
|
|
// EXTRACTED: // SuperSlabs are always SUPERSLAB_SIZE-aligned (2MB)
|
|
|
|
|
|
// EXTRACTED: if (ss && ((uintptr_t)ss & (SUPERSLAB_SIZE - 1)) == 0) {
|
|
|
|
|
|
// EXTRACTED: if (ss->magic == SUPERSLAB_MAGIC) {
|
|
|
|
|
|
// EXTRACTED: return 1; // Valid SuperSlab pointer
|
|
|
|
|
|
// EXTRACTED: }
|
|
|
|
|
|
// EXTRACTED: }
|
|
|
|
|
|
// EXTRACTED: }
|
|
|
|
|
|
// EXTRACTED:
|
|
|
|
|
|
// EXTRACTED: // Fallback to TinySlab check
|
|
|
|
|
|
// EXTRACTED: return hak_tiny_owner_slab(ptr) != NULL;
|
|
|
|
|
|
// EXTRACTED: }
|
|
|
|
|
|
|
|
|
|
|
|
// Return the usable size for a Tiny-managed pointer (0 if unknown/not tiny).
|
|
|
|
|
|
// Prefer SuperSlab metadata when available; otherwise use TinySlab owner class.
|
|
|
|
|
|
// EXTRACTED: size_t hak_tiny_usable_size(void* ptr) {
|
|
|
|
|
|
// EXTRACTED: if (!ptr || !g_tiny_initialized) return 0;
|
|
|
|
|
|
// EXTRACTED:
|
|
|
|
|
|
// EXTRACTED: // Check SuperSlab first via registry (safe under direct link and LD)
|
|
|
|
|
|
// EXTRACTED: if (g_use_superslab) {
|
|
|
|
|
|
// EXTRACTED: SuperSlab* ss = hak_super_lookup(ptr);
|
|
|
|
|
|
// EXTRACTED: if (ss && ss->magic == SUPERSLAB_MAGIC) {
|
|
|
|
|
|
// EXTRACTED: int k = (int)ss->size_class;
|
|
|
|
|
|
// EXTRACTED: if (k >= 0 && k < TINY_NUM_CLASSES) {
|
|
|
|
|
|
// EXTRACTED: return g_tiny_class_sizes[k];
|
|
|
|
|
|
// EXTRACTED: }
|
|
|
|
|
|
// EXTRACTED: }
|
|
|
|
|
|
// EXTRACTED: }
|
|
|
|
|
|
// EXTRACTED:
|
|
|
|
|
|
// EXTRACTED: // Fallback: TinySlab owner lookup
|
|
|
|
|
|
// EXTRACTED: TinySlab* slab = hak_tiny_owner_slab(ptr);
|
|
|
|
|
|
// EXTRACTED: if (slab) {
|
|
|
|
|
|
// EXTRACTED: int k = slab->class_idx;
|
|
|
|
|
|
// EXTRACTED: if (k >= 0 && k < TINY_NUM_CLASSES) {
|
|
|
|
|
|
// EXTRACTED: return g_tiny_class_sizes[k];
|
|
|
|
|
|
// EXTRACTED: }
|
|
|
|
|
|
// EXTRACTED: }
|
|
|
|
|
|
// EXTRACTED: return 0;
|
|
|
|
|
|
// EXTRACTED: }
|
|
|
|
|
|
|
|
|
|
|
|
|
2025-12-03 12:29:31 +09:00
|
|
|
|
// ============================================================================
|
2025-11-05 12:31:14 +09:00
|
|
|
|
// Statistics and Debug Functions - Extracted to hakmem_tiny_stats.c
|
2025-12-03 12:29:31 +09:00
|
|
|
|
// ============================================================================
|
2025-11-05 12:31:14 +09:00
|
|
|
|
// (Phase 2B API headers moved to top of file)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
// Optional shutdown hook to stop background components (e.g., Intelligence Engine)
|
|
|
|
|
|
void hak_tiny_shutdown(void) {
|
|
|
|
|
|
// Release TLS SuperSlab references (dec refcount) before stopping BG/INT
|
|
|
|
|
|
for (int k = 0; k < TINY_NUM_CLASSES; k++) {
|
|
|
|
|
|
TinyTLSSlab* tls = &g_tls_slabs[k];
|
|
|
|
|
|
if (tls->ss) {
|
|
|
|
|
|
superslab_ref_dec(tls->ss);
|
|
|
|
|
|
tls->ss = NULL;
|
|
|
|
|
|
tls->meta = NULL;
|
|
|
|
|
|
tls->slab_base = NULL;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
if (g_int_engine && g_int_started) {
|
|
|
|
|
|
g_int_stop = 1;
|
|
|
|
|
|
// Best-effort join; avoid deadlock if called from within the thread
|
|
|
|
|
|
if (!pthread_equal(tiny_self_pt(), g_int_thread)) {
|
|
|
|
|
|
pthread_join(g_int_thread, NULL);
|
|
|
|
|
|
}
|
|
|
|
|
|
g_int_started = 0;
|
|
|
|
|
|
g_int_engine = 0;
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Implement Phase 1: TLS SuperSlab Hint Box for Headerless performance
Design: Cache recently-used SuperSlab references in TLS to accelerate
ptr→SuperSlab resolution in Headerless mode free() path.
## Implementation
### New Box: core/box/tls_ss_hint_box.h
- Header-only Box (4-slot FIFO cache per thread)
- Functions: tls_ss_hint_init(), tls_ss_hint_update(), tls_ss_hint_lookup(), tls_ss_hint_clear()
- Memory overhead: 112 bytes per thread (negligible)
- Statistics API for debug builds (hit/miss counters)
### Integration Points
1. **Free path** (core/hakmem_tiny_free.inc):
- Lines 477-481: Fast path hint lookup before hak_super_lookup()
- Lines 550-555: Second lookup location (fallback path)
- Expected savings: 10-50 cycles → 2-5 cycles on cache hit
2. **Allocation path** (core/tiny_superslab_alloc.inc.h):
- Lines 115-122: Linear allocation return path
- Lines 179-186: Freelist allocation return path
- Cache update on successful allocation
3. **TLS variable** (core/hakmem_tiny_tls_state_box.inc):
- `__thread TlsSsHintCache g_tls_ss_hint = {0};`
### Build System
- **Build flag** (core/hakmem_build_flags.h):
- HAKMEM_TINY_SS_TLS_HINT (default: 0, disabled)
- Validation: requires HAKMEM_TINY_HEADERLESS=1
- **Makefile**:
- Removed old ss_tls_hint_box.o (conflicting implementation)
- Header-only design eliminates compiled object files
### Testing
- **Unit tests** (tests/test_tls_ss_hint.c):
- 6 test functions covering init, lookup, FIFO rotation, duplicates, clear, stats
- All tests PASSING
- **Build validation**:
- ✅ Compiles with hint disabled (default)
- ✅ Compiles with hint enabled (HAKMEM_TINY_SS_TLS_HINT=1)
### Documentation
- **Benchmark report** (docs/PHASE1_TLS_HINT_BENCHMARK.md):
- Implementation summary
- Build validation results
- Benchmark methodology (to be executed)
- Performance analysis framework
## Expected Performance
- **Hit rate**: 85-95% (single-threaded), 70-85% (multi-threaded)
- **Cycle savings**: 80-95% on cache hit (10-50 cycles → 2-5 cycles)
- **Target improvement**: 15-20% throughput increase vs Headerless baseline
- **Memory overhead**: 112 bytes per thread
## Box Theory
**Mission**: Cache hot SuperSlabs to avoid global registry lookup
**Boundary**: ptr → SuperSlab* or NULL (miss)
**Invariant**: hint.base ≤ ptr < hint.end → hit is valid
**Fallback**: Always safe to miss (triggers hak_super_lookup)
**Thread Safety**: TLS storage, no synchronization required
**Risk**: Low (read-only cache, fail-safe fallback, magic validation)
## Next Steps
1. Run full benchmark suite (sh8bench, cfrac, larson)
2. Measure actual hit rate with stats enabled
3. If performance target met (15-20% improvement), enable by default
4. Consider increasing cache slots if hit rate < 80%
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 18:06:24 +09:00
|
|
|
|
// Always-available: Trim empty slabs (release fully-free slabs)
|