Phase FREE-LEGACY-OPT-4-2/4-3: C6 ULTRA-free TLS cache + segment learning
Phase 4-2: - Add TinyC6UltraFreeTLS structure with 128-slot TLS freelist - Implement tiny_c6_ultra_free_fast/slow for C6 free hot path - Add c6_ultra_free_fast counter to FreePathStats - ENV gate: HAKMEM_TINY_C6_ULTRA_FREE_ENABLED (default: OFF) Phase 4-3: - Add segment learning on first C6 free via ss_fast_lookup() - Learn seg_base/seg_end from SuperSlab for range check - Increase cache capacity from 32 to 128 blocks Results: - Segment learning works: fast path captures blocks in segment - However, without alloc integration, cache fills up and overflows to legacy - Net effect: +1-3% (within noise range) - Drain strategy also tested: no benefit (equal overhead) Conclusion: - Free-only TLS cache is limited without alloc-side integration - Core v6 already has alloc/free integrated TLS (but -12% slower) - Keep as research box (ENV default OFF) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@ -16,9 +16,10 @@ static void free_path_stats_dump(void) {
|
||||
return;
|
||||
}
|
||||
|
||||
fprintf(stderr, "[FREE_PATH_STATS] total=%lu c7_ultra=%lu small_v3=%lu v6=%lu tiny_v1=%lu pool_v1=%lu remote=%lu super_lookup=%lu legacy_fb=%lu\n",
|
||||
fprintf(stderr, "[FREE_PATH_STATS] total=%lu c7_ultra=%lu c6_ultra_free=%lu small_v3=%lu v6=%lu tiny_v1=%lu pool_v1=%lu remote=%lu super_lookup=%lu legacy_fb=%lu\n",
|
||||
g_free_path_stats.total_calls,
|
||||
g_free_path_stats.c7_ultra_fast,
|
||||
g_free_path_stats.c6_ultra_free_fast, // Phase 4-2
|
||||
g_free_path_stats.smallheap_v3_fast,
|
||||
g_free_path_stats.smallheap_v6_fast,
|
||||
g_free_path_stats.tiny_heap_v1_fast,
|
||||
|
||||
Reference in New Issue
Block a user