Files
hakmem/core/hakmem_tiny_tls_state_box.inc

221 lines
10 KiB
PHP
Raw Normal View History

Performance Measurement Framework: Unified Cache, TLS SLL, Shared Pool Analysis ## Summary Implemented production-grade measurement infrastructure to quantify top 3 bottlenecks: - Unified cache hit/miss rates + refill cost - TLS SLL usage patterns - Shared pool lock contention distribution ## Changes ### 1. Unified Cache Metrics (tiny_unified_cache.h/c) - Added atomic counters: - g_unified_cache_hits_global: successful cache pops - g_unified_cache_misses_global: refill triggers - g_unified_cache_refill_cycles_global: refill cost in CPU cycles (rdtsc) - Instrumented `unified_cache_pop_or_refill()` to count hits - Instrumented `unified_cache_refill()` with cycle measurement - ENV-gated: HAKMEM_MEASURE_UNIFIED_CACHE=1 (default: off) - Added unified_cache_print_measurements() output function ### 2. TLS SLL Metrics (tls_sll_box.h) - Added atomic counters: - g_tls_sll_push_count_global: total pushes - g_tls_sll_pop_count_global: successful pops - g_tls_sll_pop_empty_count_global: empty list conditions - Instrumented push/pop paths - Added tls_sll_print_measurements() output function ### 3. Shared Pool Contention (hakmem_shared_pool_acquire.c) - Added atomic counters: - g_sp_stage2_lock_acquired_global: Stage 2 locks - g_sp_stage3_lock_acquired_global: Stage 3 allocations - g_sp_alloc_lock_contention_global: total lock acquisitions - Instrumented all pthread_mutex_lock calls in hot paths - Added shared_pool_print_measurements() output function ### 4. Benchmark Integration (bench_random_mixed.c) - Called all 3 print functions after benchmark loop - Functions active only when HAKMEM_MEASURE_UNIFIED_CACHE=1 set ## Design Principles - **Zero overhead when disabled**: Inline checks with __builtin_expect hints - **Atomic relaxed memory order**: Minimal synchronization overhead - **ENV-gated**: Single flag controls all measurements - **Production-safe**: Compiles in release builds, no functional changes ## Usage ```bash HAKMEM_MEASURE_UNIFIED_CACHE=1 ./bench_allocators_hakmem bench_random_mixed_hakmem 1000000 256 42 ``` Output (when enabled): ``` ======================================== Unified Cache Statistics ======================================== Hits: 1234567 Misses: 56789 Hit Rate: 95.6% Avg Refill Cycles: 1234 ======================================== TLS SLL Statistics ======================================== Total Pushes: 1234567 Total Pops: 345678 Pop Empty Count: 12345 Hit Rate: 98.8% ======================================== Shared Pool Contention Statistics ======================================== Stage 2 Locks: 123456 (33%) Stage 3 Locks: 234567 (67%) Total Contention: 357 locks per 1M ops ``` ## Next Steps 1. **Enable measurements** and run benchmarks to gather data 2. **Analyze miss rates**: Which bottleneck dominates? 3. **Profile hottest stage**: Focus optimization on top contributor 4. Possible targets: - Increase unified cache capacity if miss rate >5% - Profile if TLS SLL is unused (potential legacy code removal) - Analyze if Stage 2 lock can be replaced with CAS ## Makefile Updates Added core/box/tiny_route_box.o to: - OBJS_BASE (test build) - SHARED_OBJS (shared library) - BENCH_HAKMEM_OBJS_BASE (benchmark) - TINY_BENCH_OBJS_BASE (tiny benchmark) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 18:26:39 +09:00
// ============================================================================
// Performance Measurement: TLS SLL Hit Rate (ENV-gated)
// ============================================================================
// Global atomic counters for TLS SLL performance measurement
// ENV: HAKMEM_MEASURE_UNIFIED_CACHE=1 to enable (default: OFF)
#include <stdatomic.h>
#include "box/tiny_heap_env_box.h" // TinyHeap/C7 gate for TLS SLL skips
Performance Measurement Framework: Unified Cache, TLS SLL, Shared Pool Analysis ## Summary Implemented production-grade measurement infrastructure to quantify top 3 bottlenecks: - Unified cache hit/miss rates + refill cost - TLS SLL usage patterns - Shared pool lock contention distribution ## Changes ### 1. Unified Cache Metrics (tiny_unified_cache.h/c) - Added atomic counters: - g_unified_cache_hits_global: successful cache pops - g_unified_cache_misses_global: refill triggers - g_unified_cache_refill_cycles_global: refill cost in CPU cycles (rdtsc) - Instrumented `unified_cache_pop_or_refill()` to count hits - Instrumented `unified_cache_refill()` with cycle measurement - ENV-gated: HAKMEM_MEASURE_UNIFIED_CACHE=1 (default: off) - Added unified_cache_print_measurements() output function ### 2. TLS SLL Metrics (tls_sll_box.h) - Added atomic counters: - g_tls_sll_push_count_global: total pushes - g_tls_sll_pop_count_global: successful pops - g_tls_sll_pop_empty_count_global: empty list conditions - Instrumented push/pop paths - Added tls_sll_print_measurements() output function ### 3. Shared Pool Contention (hakmem_shared_pool_acquire.c) - Added atomic counters: - g_sp_stage2_lock_acquired_global: Stage 2 locks - g_sp_stage3_lock_acquired_global: Stage 3 allocations - g_sp_alloc_lock_contention_global: total lock acquisitions - Instrumented all pthread_mutex_lock calls in hot paths - Added shared_pool_print_measurements() output function ### 4. Benchmark Integration (bench_random_mixed.c) - Called all 3 print functions after benchmark loop - Functions active only when HAKMEM_MEASURE_UNIFIED_CACHE=1 set ## Design Principles - **Zero overhead when disabled**: Inline checks with __builtin_expect hints - **Atomic relaxed memory order**: Minimal synchronization overhead - **ENV-gated**: Single flag controls all measurements - **Production-safe**: Compiles in release builds, no functional changes ## Usage ```bash HAKMEM_MEASURE_UNIFIED_CACHE=1 ./bench_allocators_hakmem bench_random_mixed_hakmem 1000000 256 42 ``` Output (when enabled): ``` ======================================== Unified Cache Statistics ======================================== Hits: 1234567 Misses: 56789 Hit Rate: 95.6% Avg Refill Cycles: 1234 ======================================== TLS SLL Statistics ======================================== Total Pushes: 1234567 Total Pops: 345678 Pop Empty Count: 12345 Hit Rate: 98.8% ======================================== Shared Pool Contention Statistics ======================================== Stage 2 Locks: 123456 (33%) Stage 3 Locks: 234567 (67%) Total Contention: 357 locks per 1M ops ``` ## Next Steps 1. **Enable measurements** and run benchmarks to gather data 2. **Analyze miss rates**: Which bottleneck dominates? 3. **Profile hottest stage**: Focus optimization on top contributor 4. Possible targets: - Increase unified cache capacity if miss rate >5% - Profile if TLS SLL is unused (potential legacy code removal) - Analyze if Stage 2 lock can be replaced with CAS ## Makefile Updates Added core/box/tiny_route_box.o to: - OBJS_BASE (test build) - SHARED_OBJS (shared library) - BENCH_HAKMEM_OBJS_BASE (benchmark) - TINY_BENCH_OBJS_BASE (tiny benchmark) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 18:26:39 +09:00
_Atomic uint64_t g_tls_sll_push_count_global = 0;
_Atomic uint64_t g_tls_sll_pop_count_global = 0;
_Atomic uint64_t g_tls_sll_pop_empty_count_global = 0;
Refactor: Extract 3 more Box modules from hakmem_tiny.c (-70% total reduction) Continue hakmem_tiny.c refactoring with 3 large module extractions. ## Changes **hakmem_tiny.c**: 995 → 616 lines (-379 lines, -38% this phase) **Total reduction**: 2081 → 616 lines (-1465 lines, -70% cumulative) 🏆 ## Extracted Modules (3 new boxes) 6. **tls_state_box** (224 lines) - TLS SLL enable flags and configuration - TLS canaries and SLL array definitions - Debug counters (path, ultra, allocation) - Frontend/backend configuration - TLS thread ID caching helpers - Frontend hit/miss counters - HotMag, QuickSlot, Ultra-front configuration - Helper functions (is_hot_class, tiny_optional_push) - Intelligence system helpers 7. **legacy_slow_box** (96 lines) - tiny_slow_alloc_fast() function (cold/unused) - Legacy slab-based allocation with refill - TLS cache/fast cache refill from slabs - Remote drain handling - List management (move to full/free lists) - Marked __attribute__((cold, noinline, unused)) 8. **slab_lookup_box** (77 lines) - registry_lookup() - O(1) hash-based lookup - hak_tiny_owner_slab() - public API for slab discovery - Linear probing search with atomic owner access - O(N) fallback for non-registry mode - Safety validation for membership checking ## Cumulative Progress (8 boxes total) **Previously extracted** (Phase 1): 1. config_box (211 lines) 2. publish_box (419 lines) 3. globals_box (256 lines) 4. phase6_wrappers_box (122 lines) 5. ace_guard_box (100 lines) **This phase** (Phase 2): 6. tls_state_box (224 lines) 7. legacy_slow_box (96 lines) 8. slab_lookup_box (77 lines) **Total extracted**: 1,505 lines across 8 coherent modules **Remaining core**: 616 lines (well-organized, focused) ## Benefits - **Readability**: 2k monolith → focused 616-line core - **Maintainability**: Each box has single responsibility - **Organization**: TLS state, legacy code, lookup utilities separated - **Build**: All modules compile successfully ✅ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 01:23:59 +09:00
// Hot-path cheap sampling counter to avoid rand() in allocation path
// Phase 9.4: TLS single-linked freelist (mimalloc-inspired) for hottest classes (≤128B/≤256B)
int g_tls_sll_enable = 1; // HAKMEM_TINY_TLS_SLL=0 to disable
// Phase 6-1.7: Export TLS variables for box refactor (Box 5/6 need access from hakmem.c)
// CRITICAL FIX: Explicit initializers prevent SEGV from uninitialized TLS in worker threads
// PRIORITY 3: TLS Canaries - Add canaries around TLS arrays to detect buffer overruns
#define TLS_CANARY_MAGIC 0xDEADBEEFDEADBEEFULL
// Phase 3d-B: Unified TLS SLL (head+count in same cache line for +12-18% cache hit rate)
Code Cleanup: Remove false positives, redundant validations, and reduce verbose logging Following the C7 stride upgrade fix (commit 23c0d9541), this commit performs comprehensive cleanup to improve code quality and reduce debug noise. ## Changes ### 1. Disable False Positive Checks (tiny_nextptr.h) - **Disabled**: NXT_MISALIGN validation block with `#if 0` - **Reason**: Produces false positives due to slab base offsets (2048, 65536) not being stride-aligned, causing all blocks to appear "misaligned" - **TODO**: Reimplement to check stride DISTANCE between consecutive blocks instead of absolute alignment to stride boundaries ### 2. Remove Redundant Geometry Validations **hakmem_tiny_refill_p0.inc.h (P0 batch refill)** - Removed 25-line CARVE_GEOMETRY_FIX validation block - Replaced with NOTE explaining redundancy - **Reason**: Stride table is now correct in tiny_block_stride_for_class(), defense-in-depth validation adds overhead without benefit **ss_legacy_backend_box.c (legacy backend)** - Removed 18-line LEGACY_FIX_GEOMETRY validation block - Replaced with NOTE explaining redundancy - **Reason**: Shared_pool validates geometry at acquisition time ### 3. Reduce Verbose Logging **hakmem_shared_pool.c (sp_fix_geometry_if_needed)** - Made SP_FIX_GEOMETRY logging conditional on `!HAKMEM_BUILD_RELEASE` - **Reason**: Geometry fixes are expected during stride upgrades, no need to log in release builds ### 4. Verification - Build: ✅ Successful (LTO warnings expected) - Test: ✅ 10K iterations (1.87M ops/s, no crashes) - NXT_MISALIGN false positives: ✅ Eliminated ## Files Modified - core/tiny_nextptr.h - Disabled false positive NXT_MISALIGN check - core/hakmem_tiny_refill_p0.inc.h - Removed redundant CARVE validation - core/box/ss_legacy_backend_box.c - Removed redundant LEGACY validation - core/hakmem_shared_pool.c - Made SP_FIX_GEOMETRY logging debug-only ## Impact - **Code clarity**: Removed 43 lines of redundant validation code - **Debug noise**: Reduced false positive diagnostics - **Performance**: Eliminated overhead from redundant geometry checks - **Maintainability**: Single source of truth for geometry validation 🧹 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 23:00:24 +09:00
#include "front/tiny_heap_v2.h"
Refactor: Extract 3 more Box modules from hakmem_tiny.c (-70% total reduction) Continue hakmem_tiny.c refactoring with 3 large module extractions. ## Changes **hakmem_tiny.c**: 995 → 616 lines (-379 lines, -38% this phase) **Total reduction**: 2081 → 616 lines (-1465 lines, -70% cumulative) 🏆 ## Extracted Modules (3 new boxes) 6. **tls_state_box** (224 lines) - TLS SLL enable flags and configuration - TLS canaries and SLL array definitions - Debug counters (path, ultra, allocation) - Frontend/backend configuration - TLS thread ID caching helpers - Frontend hit/miss counters - HotMag, QuickSlot, Ultra-front configuration - Helper functions (is_hot_class, tiny_optional_push) - Intelligence system helpers 7. **legacy_slow_box** (96 lines) - tiny_slow_alloc_fast() function (cold/unused) - Legacy slab-based allocation with refill - TLS cache/fast cache refill from slabs - Remote drain handling - List management (move to full/free lists) - Marked __attribute__((cold, noinline, unused)) 8. **slab_lookup_box** (77 lines) - registry_lookup() - O(1) hash-based lookup - hak_tiny_owner_slab() - public API for slab discovery - Linear probing search with atomic owner access - O(N) fallback for non-registry mode - Safety validation for membership checking ## Cumulative Progress (8 boxes total) **Previously extracted** (Phase 1): 1. config_box (211 lines) 2. publish_box (419 lines) 3. globals_box (256 lines) 4. phase6_wrappers_box (122 lines) 5. ace_guard_box (100 lines) **This phase** (Phase 2): 6. tls_state_box (224 lines) 7. legacy_slow_box (96 lines) 8. slab_lookup_box (77 lines) **Total extracted**: 1,505 lines across 8 coherent modules **Remaining core**: 616 lines (well-organized, focused) ## Benefits - **Readability**: 2k monolith → focused 616-line core - **Maintainability**: Each box has single responsibility - **Organization**: TLS state, legacy code, lookup utilities separated - **Build**: All modules compile successfully ✅ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 01:23:59 +09:00
__thread uint64_t g_tls_canary_before_sll = TLS_CANARY_MAGIC;
Code Cleanup: Remove false positives, redundant validations, and reduce verbose logging Following the C7 stride upgrade fix (commit 23c0d9541), this commit performs comprehensive cleanup to improve code quality and reduce debug noise. ## Changes ### 1. Disable False Positive Checks (tiny_nextptr.h) - **Disabled**: NXT_MISALIGN validation block with `#if 0` - **Reason**: Produces false positives due to slab base offsets (2048, 65536) not being stride-aligned, causing all blocks to appear "misaligned" - **TODO**: Reimplement to check stride DISTANCE between consecutive blocks instead of absolute alignment to stride boundaries ### 2. Remove Redundant Geometry Validations **hakmem_tiny_refill_p0.inc.h (P0 batch refill)** - Removed 25-line CARVE_GEOMETRY_FIX validation block - Replaced with NOTE explaining redundancy - **Reason**: Stride table is now correct in tiny_block_stride_for_class(), defense-in-depth validation adds overhead without benefit **ss_legacy_backend_box.c (legacy backend)** - Removed 18-line LEGACY_FIX_GEOMETRY validation block - Replaced with NOTE explaining redundancy - **Reason**: Shared_pool validates geometry at acquisition time ### 3. Reduce Verbose Logging **hakmem_shared_pool.c (sp_fix_geometry_if_needed)** - Made SP_FIX_GEOMETRY logging conditional on `!HAKMEM_BUILD_RELEASE` - **Reason**: Geometry fixes are expected during stride upgrades, no need to log in release builds ### 4. Verification - Build: ✅ Successful (LTO warnings expected) - Test: ✅ 10K iterations (1.87M ops/s, no crashes) - NXT_MISALIGN false positives: ✅ Eliminated ## Files Modified - core/tiny_nextptr.h - Disabled false positive NXT_MISALIGN check - core/hakmem_tiny_refill_p0.inc.h - Removed redundant CARVE validation - core/box/ss_legacy_backend_box.c - Removed redundant LEGACY validation - core/hakmem_shared_pool.c - Made SP_FIX_GEOMETRY logging debug-only ## Impact - **Code clarity**: Removed 43 lines of redundant validation code - **Debug noise**: Reduced false positive diagnostics - **Performance**: Eliminated overhead from redundant geometry checks - **Maintainability**: Single source of truth for geometry validation 🧹 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 23:00:24 +09:00
__thread TinyTLSSLL g_tls_sll[TINY_NUM_CLASSES] = {0};
Refactor: Extract 3 more Box modules from hakmem_tiny.c (-70% total reduction) Continue hakmem_tiny.c refactoring with 3 large module extractions. ## Changes **hakmem_tiny.c**: 995 → 616 lines (-379 lines, -38% this phase) **Total reduction**: 2081 → 616 lines (-1465 lines, -70% cumulative) 🏆 ## Extracted Modules (3 new boxes) 6. **tls_state_box** (224 lines) - TLS SLL enable flags and configuration - TLS canaries and SLL array definitions - Debug counters (path, ultra, allocation) - Frontend/backend configuration - TLS thread ID caching helpers - Frontend hit/miss counters - HotMag, QuickSlot, Ultra-front configuration - Helper functions (is_hot_class, tiny_optional_push) - Intelligence system helpers 7. **legacy_slow_box** (96 lines) - tiny_slow_alloc_fast() function (cold/unused) - Legacy slab-based allocation with refill - TLS cache/fast cache refill from slabs - Remote drain handling - List management (move to full/free lists) - Marked __attribute__((cold, noinline, unused)) 8. **slab_lookup_box** (77 lines) - registry_lookup() - O(1) hash-based lookup - hak_tiny_owner_slab() - public API for slab discovery - Linear probing search with atomic owner access - O(N) fallback for non-registry mode - Safety validation for membership checking ## Cumulative Progress (8 boxes total) **Previously extracted** (Phase 1): 1. config_box (211 lines) 2. publish_box (419 lines) 3. globals_box (256 lines) 4. phase6_wrappers_box (122 lines) 5. ace_guard_box (100 lines) **This phase** (Phase 2): 6. tls_state_box (224 lines) 7. legacy_slow_box (96 lines) 8. slab_lookup_box (77 lines) **Total extracted**: 1,505 lines across 8 coherent modules **Remaining core**: 616 lines (well-organized, focused) ## Benefits - **Readability**: 2k monolith → focused 616-line core - **Maintainability**: Each box has single responsibility - **Organization**: TLS state, legacy code, lookup utilities separated - **Build**: All modules compile successfully ✅ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 01:23:59 +09:00
__thread uint64_t g_tls_canary_after_sll = TLS_CANARY_MAGIC;
Code Cleanup: Remove false positives, redundant validations, and reduce verbose logging Following the C7 stride upgrade fix (commit 23c0d9541), this commit performs comprehensive cleanup to improve code quality and reduce debug noise. ## Changes ### 1. Disable False Positive Checks (tiny_nextptr.h) - **Disabled**: NXT_MISALIGN validation block with `#if 0` - **Reason**: Produces false positives due to slab base offsets (2048, 65536) not being stride-aligned, causing all blocks to appear "misaligned" - **TODO**: Reimplement to check stride DISTANCE between consecutive blocks instead of absolute alignment to stride boundaries ### 2. Remove Redundant Geometry Validations **hakmem_tiny_refill_p0.inc.h (P0 batch refill)** - Removed 25-line CARVE_GEOMETRY_FIX validation block - Replaced with NOTE explaining redundancy - **Reason**: Stride table is now correct in tiny_block_stride_for_class(), defense-in-depth validation adds overhead without benefit **ss_legacy_backend_box.c (legacy backend)** - Removed 18-line LEGACY_FIX_GEOMETRY validation block - Replaced with NOTE explaining redundancy - **Reason**: Shared_pool validates geometry at acquisition time ### 3. Reduce Verbose Logging **hakmem_shared_pool.c (sp_fix_geometry_if_needed)** - Made SP_FIX_GEOMETRY logging conditional on `!HAKMEM_BUILD_RELEASE` - **Reason**: Geometry fixes are expected during stride upgrades, no need to log in release builds ### 4. Verification - Build: ✅ Successful (LTO warnings expected) - Test: ✅ 10K iterations (1.87M ops/s, no crashes) - NXT_MISALIGN false positives: ✅ Eliminated ## Files Modified - core/tiny_nextptr.h - Disabled false positive NXT_MISALIGN check - core/hakmem_tiny_refill_p0.inc.h - Removed redundant CARVE validation - core/box/ss_legacy_backend_box.c - Removed redundant LEGACY validation - core/hakmem_shared_pool.c - Made SP_FIX_GEOMETRY logging debug-only ## Impact - **Code clarity**: Removed 43 lines of redundant validation code - **Debug noise**: Reduced false positive diagnostics - **Performance**: Eliminated overhead from redundant geometry checks - **Maintainability**: Single source of truth for geometry validation 🧹 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 23:00:24 +09:00
__thread const char* g_tls_sll_last_writer[TINY_NUM_CLASSES] = {0};
__thread TinyHeapV2Mag g_tiny_heap_v2_mag[TINY_NUM_CLASSES] = {0};
__thread TinyHeapV2Stats g_tiny_heap_v2_stats[TINY_NUM_CLASSES] = {0};
__thread int g_tls_heap_v2_initialized = 0;
Implement Phase 1: TLS SuperSlab Hint Box for Headerless performance Design: Cache recently-used SuperSlab references in TLS to accelerate ptr→SuperSlab resolution in Headerless mode free() path. ## Implementation ### New Box: core/box/tls_ss_hint_box.h - Header-only Box (4-slot FIFO cache per thread) - Functions: tls_ss_hint_init(), tls_ss_hint_update(), tls_ss_hint_lookup(), tls_ss_hint_clear() - Memory overhead: 112 bytes per thread (negligible) - Statistics API for debug builds (hit/miss counters) ### Integration Points 1. **Free path** (core/hakmem_tiny_free.inc): - Lines 477-481: Fast path hint lookup before hak_super_lookup() - Lines 550-555: Second lookup location (fallback path) - Expected savings: 10-50 cycles → 2-5 cycles on cache hit 2. **Allocation path** (core/tiny_superslab_alloc.inc.h): - Lines 115-122: Linear allocation return path - Lines 179-186: Freelist allocation return path - Cache update on successful allocation 3. **TLS variable** (core/hakmem_tiny_tls_state_box.inc): - `__thread TlsSsHintCache g_tls_ss_hint = {0};` ### Build System - **Build flag** (core/hakmem_build_flags.h): - HAKMEM_TINY_SS_TLS_HINT (default: 0, disabled) - Validation: requires HAKMEM_TINY_HEADERLESS=1 - **Makefile**: - Removed old ss_tls_hint_box.o (conflicting implementation) - Header-only design eliminates compiled object files ### Testing - **Unit tests** (tests/test_tls_ss_hint.c): - 6 test functions covering init, lookup, FIFO rotation, duplicates, clear, stats - All tests PASSING - **Build validation**: - ✅ Compiles with hint disabled (default) - ✅ Compiles with hint enabled (HAKMEM_TINY_SS_TLS_HINT=1) ### Documentation - **Benchmark report** (docs/PHASE1_TLS_HINT_BENCHMARK.md): - Implementation summary - Build validation results - Benchmark methodology (to be executed) - Performance analysis framework ## Expected Performance - **Hit rate**: 85-95% (single-threaded), 70-85% (multi-threaded) - **Cycle savings**: 80-95% on cache hit (10-50 cycles → 2-5 cycles) - **Target improvement**: 15-20% throughput increase vs Headerless baseline - **Memory overhead**: 112 bytes per thread ## Box Theory **Mission**: Cache hot SuperSlabs to avoid global registry lookup **Boundary**: ptr → SuperSlab* or NULL (miss) **Invariant**: hint.base ≤ ptr < hint.end → hit is valid **Fallback**: Always safe to miss (triggers hak_super_lookup) **Thread Safety**: TLS storage, no synchronization required **Risk**: Low (read-only cache, fail-safe fallback, magic validation) ## Next Steps 1. Run full benchmark suite (sh8bench, cfrac, larson) 2. Measure actual hit rate with stats enabled 3. If performance target met (15-20% improvement), enable by default 4. Consider increasing cache slots if hit rate < 80% 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-03 18:06:24 +09:00
// Phase 1: TLS SuperSlab Hint Box for Headerless mode
// Size: 112 bytes per thread (4 slots * 24 bytes + 16 bytes overhead)
#if HAKMEM_TINY_SS_TLS_HINT
#include "box/tls_ss_hint_box.h"
__thread TlsSsHintCache g_tls_ss_hint = {0};
#endif
Refactor: Extract 3 more Box modules from hakmem_tiny.c (-70% total reduction) Continue hakmem_tiny.c refactoring with 3 large module extractions. ## Changes **hakmem_tiny.c**: 995 → 616 lines (-379 lines, -38% this phase) **Total reduction**: 2081 → 616 lines (-1465 lines, -70% cumulative) 🏆 ## Extracted Modules (3 new boxes) 6. **tls_state_box** (224 lines) - TLS SLL enable flags and configuration - TLS canaries and SLL array definitions - Debug counters (path, ultra, allocation) - Frontend/backend configuration - TLS thread ID caching helpers - Frontend hit/miss counters - HotMag, QuickSlot, Ultra-front configuration - Helper functions (is_hot_class, tiny_optional_push) - Intelligence system helpers 7. **legacy_slow_box** (96 lines) - tiny_slow_alloc_fast() function (cold/unused) - Legacy slab-based allocation with refill - TLS cache/fast cache refill from slabs - Remote drain handling - List management (move to full/free lists) - Marked __attribute__((cold, noinline, unused)) 8. **slab_lookup_box** (77 lines) - registry_lookup() - O(1) hash-based lookup - hak_tiny_owner_slab() - public API for slab discovery - Linear probing search with atomic owner access - O(N) fallback for non-registry mode - Safety validation for membership checking ## Cumulative Progress (8 boxes total) **Previously extracted** (Phase 1): 1. config_box (211 lines) 2. publish_box (419 lines) 3. globals_box (256 lines) 4. phase6_wrappers_box (122 lines) 5. ace_guard_box (100 lines) **This phase** (Phase 2): 6. tls_state_box (224 lines) 7. legacy_slow_box (96 lines) 8. slab_lookup_box (77 lines) **Total extracted**: 1,505 lines across 8 coherent modules **Remaining core**: 616 lines (well-organized, focused) ## Benefits - **Readability**: 2k monolith → focused 616-line core - **Maintainability**: Each box has single responsibility - **Organization**: TLS state, legacy code, lookup utilities separated - **Build**: All modules compile successfully ✅ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 01:23:59 +09:00
static int g_tiny_ultra = 0; // HAKMEM_TINY_ULTRA=1 for SLL-only ultra mode
static int g_ultra_validate = 0; // HAKMEM_TINY_ULTRA_VALIDATE=1 to enable per-pop validation
// Ultra debug counters
#if HAKMEM_DEBUG_COUNTERS
static __attribute__((unused)) uint64_t g_ultra_pop_hits[TINY_NUM_CLASSES] = {0};
static uint64_t g_ultra_refill_calls[TINY_NUM_CLASSES] = {0};
static __attribute__((unused)) uint64_t g_ultra_resets[TINY_NUM_CLASSES] = {0};
#endif
// Path counters (normal mode visibility): lightweight, for debugging/bench only
#if HAKMEM_DEBUG_COUNTERS
static __attribute__((unused)) uint64_t g_path_sll_pop[TINY_NUM_CLASSES] = {0};
static __attribute__((unused)) uint64_t g_path_mag_pop[TINY_NUM_CLASSES] = {0};
static __attribute__((unused)) uint64_t g_path_front_pop[TINY_NUM_CLASSES] = {0};
static __attribute__((unused)) uint64_t g_path_superslab[TINY_NUM_CLASSES] = {0};
static __attribute__((unused)) uint64_t g_path_refill_calls[TINY_NUM_CLASSES] = {0};
// New: slow/bitmap/bump/bin instrumentation
static __attribute__((unused)) uint64_t g_alloc_slow_calls[TINY_NUM_CLASSES] = {0};
static __attribute__((unused)) uint64_t g_superslab_refill_calls_dbg[TINY_NUM_CLASSES] = {0};
static __attribute__((unused)) uint64_t g_bitmap_scan_calls[TINY_NUM_CLASSES] = {0};
static __attribute__((unused)) uint64_t g_bgbin_pops[TINY_NUM_CLASSES] = {0};
static __attribute__((unused)) uint64_t g_bump_hits[TINY_NUM_CLASSES] = {0};
static __attribute__((unused)) uint64_t g_bump_arms[TINY_NUM_CLASSES] = {0};
static __attribute__((unused)) uint64_t g_spec_calls[TINY_NUM_CLASSES] = {0};
static __attribute__((unused)) uint64_t g_spec_hits[TINY_NUM_CLASSES] = {0};
#endif
static int g_path_debug_enabled = 0;
// Spill hysteresisfreeホットパスからgetenvを排除
static int g_spill_hyst = 32; // default margin (configured at init; never getenv on hot path)
// Optional per-class refill batch overrides (0=use global defaults)
static int g_refill_max_c[TINY_NUM_CLASSES] = {0};
static int g_refill_max_hot_c[TINY_NUM_CLASSES] = {0};
static inline __attribute__((always_inline)) int tiny_refill_max_for_class(int class_idx) {
int v = g_refill_max_c[class_idx];
if (v > 0) return v;
if (class_idx <= 3) {
int hv = g_refill_max_hot_c[class_idx];
if (hv > 0) return hv;
return g_tiny_refill_max_hot;
}
return g_tiny_refill_max;
}
// Phase 9.5: Frontend/Backend split - Tiny Front modulesQuickSlot / FastCache
#include "front/quick_slot.h"
#include "front/fast_cache.h"
__thread TinyFastCache g_fast_cache[TINY_NUM_CLASSES];
static int g_frontend_enable = 0; // HAKMEM_TINY_FRONTEND=1 (experimental ultra-fast frontend)
// SLL capacity multiplier for hot tiny classes (env: HAKMEM_SLL_MULTIPLIER)
int g_sll_multiplier = 2;
// Cached thread id (uint32) to avoid repeated pthread_self() in hot paths
static __thread uint32_t g_tls_tid32;
static __thread int g_tls_tid32_inited;
// Phase 6-1.7: Export for box refactor (Box 6 needs access from hakmem.c)
#ifdef HAKMEM_TINY_PHASE6_BOX_REFACTOR
inline __attribute__((always_inline)) uint32_t tiny_self_u32(void) {
#else
static inline __attribute__((always_inline)) uint32_t tiny_self_u32(void) {
#endif
if (__builtin_expect(!g_tls_tid32_inited, 0)) {
g_tls_tid32 = (uint32_t)(uintptr_t)pthread_self();
g_tls_tid32_inited = 1;
}
return g_tls_tid32;
}
// Cached pthread_t as-is for APIs that require pthread_t comparison
static __thread pthread_t g_tls_pt_self;
static __thread int g_tls_pt_inited;
// Frontend FastCache hit/miss counters (Small diagnostics)
unsigned long long g_front_fc_hit[TINY_NUM_CLASSES] = {0};
unsigned long long g_front_fc_miss[TINY_NUM_CLASSES] = {0};
// TLS SLL class mask: bit i = 1 allows SLL for class i. Default: all 8 classes enabled.
int g_tls_sll_class_mask = 0xFF;
// Phase 6-1.7: Export for box refactor (Box 6 needs access from hakmem.c)
static inline __attribute__((always_inline)) pthread_t tiny_self_pt(void) {
if (__builtin_expect(!g_tls_pt_inited, 0)) {
g_tls_pt_self = pthread_self();
g_tls_pt_inited = 1;
}
return g_tls_pt_self;
}
#include "tiny_refill.h"
// tiny_mmap_gate.h already included at top
#include "tiny_publish.h"
// Optional prefetch on SLL pop (guarded by env: HAKMEM_TINY_PREFETCH=1)
static int g_tiny_prefetch = 0;
// Small-class magazine pre-initialization (to avoid cap==0 checks on hot path)
// Hot-class small TLS magazine実体とスイッチ
typedef struct {
void* slots[128];
uint16_t top; // 0..128
uint16_t cap; // =128
} TinyHotMag;
static int g_hotmag_cap_default = 128; // default capacity (fixed)
static int g_hotmag_refill_default = 32; // default refill batch (fixed)
static int g_hotmag_enable = 0; // 既定OFFENVトグル削除済み
Refactor: Extract 3 more Box modules from hakmem_tiny.c (-70% total reduction) Continue hakmem_tiny.c refactoring with 3 large module extractions. ## Changes **hakmem_tiny.c**: 995 → 616 lines (-379 lines, -38% this phase) **Total reduction**: 2081 → 616 lines (-1465 lines, -70% cumulative) 🏆 ## Extracted Modules (3 new boxes) 6. **tls_state_box** (224 lines) - TLS SLL enable flags and configuration - TLS canaries and SLL array definitions - Debug counters (path, ultra, allocation) - Frontend/backend configuration - TLS thread ID caching helpers - Frontend hit/miss counters - HotMag, QuickSlot, Ultra-front configuration - Helper functions (is_hot_class, tiny_optional_push) - Intelligence system helpers 7. **legacy_slow_box** (96 lines) - tiny_slow_alloc_fast() function (cold/unused) - Legacy slab-based allocation with refill - TLS cache/fast cache refill from slabs - Remote drain handling - List management (move to full/free lists) - Marked __attribute__((cold, noinline, unused)) 8. **slab_lookup_box** (77 lines) - registry_lookup() - O(1) hash-based lookup - hak_tiny_owner_slab() - public API for slab discovery - Linear probing search with atomic owner access - O(N) fallback for non-registry mode - Safety validation for membership checking ## Cumulative Progress (8 boxes total) **Previously extracted** (Phase 1): 1. config_box (211 lines) 2. publish_box (419 lines) 3. globals_box (256 lines) 4. phase6_wrappers_box (122 lines) 5. ace_guard_box (100 lines) **This phase** (Phase 2): 6. tls_state_box (224 lines) 7. legacy_slow_box (96 lines) 8. slab_lookup_box (77 lines) **Total extracted**: 1,505 lines across 8 coherent modules **Remaining core**: 616 lines (well-organized, focused) ## Benefits - **Readability**: 2k monolith → focused 616-line core - **Maintainability**: Each box has single responsibility - **Organization**: TLS state, legacy code, lookup utilities separated - **Build**: All modules compile successfully ✅ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 01:23:59 +09:00
static uint16_t g_hotmag_cap_current[TINY_NUM_CLASSES];
static uint8_t g_hotmag_cap_locked[TINY_NUM_CLASSES];
static uint16_t g_hotmag_refill_current[TINY_NUM_CLASSES];
static uint8_t g_hotmag_refill_locked[TINY_NUM_CLASSES];
static uint8_t g_hotmag_class_en[TINY_NUM_CLASSES]; // 0=disabled for class, 1=enabled
static __thread TinyHotMag g_tls_hot_mag[TINY_NUM_CLASSES];
// Inline helpers
#include "box/tls_sll_box.h" // Box TLS-SLL: Safe SLL operations API (needed by hotmag)
#include "hakmem_tiny_hotmag.inc.h"
Code Cleanup: Remove false positives, redundant validations, and reduce verbose logging Following the C7 stride upgrade fix (commit 23c0d9541), this commit performs comprehensive cleanup to improve code quality and reduce debug noise. ## Changes ### 1. Disable False Positive Checks (tiny_nextptr.h) - **Disabled**: NXT_MISALIGN validation block with `#if 0` - **Reason**: Produces false positives due to slab base offsets (2048, 65536) not being stride-aligned, causing all blocks to appear "misaligned" - **TODO**: Reimplement to check stride DISTANCE between consecutive blocks instead of absolute alignment to stride boundaries ### 2. Remove Redundant Geometry Validations **hakmem_tiny_refill_p0.inc.h (P0 batch refill)** - Removed 25-line CARVE_GEOMETRY_FIX validation block - Replaced with NOTE explaining redundancy - **Reason**: Stride table is now correct in tiny_block_stride_for_class(), defense-in-depth validation adds overhead without benefit **ss_legacy_backend_box.c (legacy backend)** - Removed 18-line LEGACY_FIX_GEOMETRY validation block - Replaced with NOTE explaining redundancy - **Reason**: Shared_pool validates geometry at acquisition time ### 3. Reduce Verbose Logging **hakmem_shared_pool.c (sp_fix_geometry_if_needed)** - Made SP_FIX_GEOMETRY logging conditional on `!HAKMEM_BUILD_RELEASE` - **Reason**: Geometry fixes are expected during stride upgrades, no need to log in release builds ### 4. Verification - Build: ✅ Successful (LTO warnings expected) - Test: ✅ 10K iterations (1.87M ops/s, no crashes) - NXT_MISALIGN false positives: ✅ Eliminated ## Files Modified - core/tiny_nextptr.h - Disabled false positive NXT_MISALIGN check - core/hakmem_tiny_refill_p0.inc.h - Removed redundant CARVE validation - core/box/ss_legacy_backend_box.c - Removed redundant LEGACY validation - core/hakmem_shared_pool.c - Made SP_FIX_GEOMETRY logging debug-only ## Impact - **Code clarity**: Removed 43 lines of redundant validation code - **Debug noise**: Reduced false positive diagnostics - **Performance**: Eliminated overhead from redundant geometry checks - **Maintainability**: Single source of truth for geometry validation 🧹 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 23:00:24 +09:00
// Diagnostics: invalid TLS SLL pointers detected (range check failures)
_Atomic uint64_t g_tls_sll_invalid_head[TINY_NUM_CLASSES] = {0};
_Atomic uint64_t g_tls_sll_invalid_push[TINY_NUM_CLASSES] = {0};
_Atomic uint64_t g_tls_sll_pop_counter[TINY_NUM_CLASSES] = {0};
Refactor: Extract 3 more Box modules from hakmem_tiny.c (-70% total reduction) Continue hakmem_tiny.c refactoring with 3 large module extractions. ## Changes **hakmem_tiny.c**: 995 → 616 lines (-379 lines, -38% this phase) **Total reduction**: 2081 → 616 lines (-1465 lines, -70% cumulative) 🏆 ## Extracted Modules (3 new boxes) 6. **tls_state_box** (224 lines) - TLS SLL enable flags and configuration - TLS canaries and SLL array definitions - Debug counters (path, ultra, allocation) - Frontend/backend configuration - TLS thread ID caching helpers - Frontend hit/miss counters - HotMag, QuickSlot, Ultra-front configuration - Helper functions (is_hot_class, tiny_optional_push) - Intelligence system helpers 7. **legacy_slow_box** (96 lines) - tiny_slow_alloc_fast() function (cold/unused) - Legacy slab-based allocation with refill - TLS cache/fast cache refill from slabs - Remote drain handling - List management (move to full/free lists) - Marked __attribute__((cold, noinline, unused)) 8. **slab_lookup_box** (77 lines) - registry_lookup() - O(1) hash-based lookup - hak_tiny_owner_slab() - public API for slab discovery - Linear probing search with atomic owner access - O(N) fallback for non-registry mode - Safety validation for membership checking ## Cumulative Progress (8 boxes total) **Previously extracted** (Phase 1): 1. config_box (211 lines) 2. publish_box (419 lines) 3. globals_box (256 lines) 4. phase6_wrappers_box (122 lines) 5. ace_guard_box (100 lines) **This phase** (Phase 2): 6. tls_state_box (224 lines) 7. legacy_slow_box (96 lines) 8. slab_lookup_box (77 lines) **Total extracted**: 1,505 lines across 8 coherent modules **Remaining core**: 616 lines (well-organized, focused) ## Benefits - **Readability**: 2k monolith → focused 616-line core - **Maintainability**: Each box has single responsibility - **Organization**: TLS state, legacy code, lookup utilities separated - **Build**: All modules compile successfully ✅ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 01:23:59 +09:00
// Size-specialized tiny alloc (32B/64B) via function pointers (A/B用)
// TinyQuickSlot: 1 cache line per class (quick 6 items + small metadata)
// Opt-in via HAKMEM_TINY_QUICK=1
// NOTE: This type definition must come BEFORE the Phase 2D-1 includes below
int g_quick_enable = 0; // HAKMEM_TINY_QUICK=1
__thread TinyQuickSlot g_tls_quick[TINY_NUM_CLASSES]; // compile-out via guards below
// Phase 2D-1: Hot-path inline function extractionsFront
// NOTE: TinyFastCache/TinyQuickSlot は front/ で定義済み
#include "hakmem_tiny_hot_pop.inc.h" // 4 functions: tiny_hot_pop_class{0..3}
#include "hakmem_tiny_refill.inc.h" // 8 functions: refill operations
#if HAKMEM_TINY_P0_BATCH_REFILL
#include "hakmem_tiny_refill_p0.inc.h" // P0 batch refill → FastCache 直補充
#endif
// Phase 7 Task 3: Pre-warm TLS cache at init
// Pre-allocate blocks to reduce first-allocation miss penalty
#if HAKMEM_TINY_PREWARM_TLS
void hak_tiny_prewarm_tls_cache(void) {
// Pre-warm each class with HAKMEM_TINY_PREWARM_COUNT blocks
// This reduces the first-allocation miss penalty by populating TLS cache
// Phase E1-CORRECT: ALL classes (including C7) now use TLS SLL
for (int class_idx = 0; class_idx < TINY_NUM_CLASSES; class_idx++) {
// TinyHeap front ON では対象クラスを TLS SLL へ積まず、TinyHeapBox 側に任せる。
if (tiny_heap_class_route_enabled(class_idx)) {
continue;
}
Refactor: Extract 3 more Box modules from hakmem_tiny.c (-70% total reduction) Continue hakmem_tiny.c refactoring with 3 large module extractions. ## Changes **hakmem_tiny.c**: 995 → 616 lines (-379 lines, -38% this phase) **Total reduction**: 2081 → 616 lines (-1465 lines, -70% cumulative) 🏆 ## Extracted Modules (3 new boxes) 6. **tls_state_box** (224 lines) - TLS SLL enable flags and configuration - TLS canaries and SLL array definitions - Debug counters (path, ultra, allocation) - Frontend/backend configuration - TLS thread ID caching helpers - Frontend hit/miss counters - HotMag, QuickSlot, Ultra-front configuration - Helper functions (is_hot_class, tiny_optional_push) - Intelligence system helpers 7. **legacy_slow_box** (96 lines) - tiny_slow_alloc_fast() function (cold/unused) - Legacy slab-based allocation with refill - TLS cache/fast cache refill from slabs - Remote drain handling - List management (move to full/free lists) - Marked __attribute__((cold, noinline, unused)) 8. **slab_lookup_box** (77 lines) - registry_lookup() - O(1) hash-based lookup - hak_tiny_owner_slab() - public API for slab discovery - Linear probing search with atomic owner access - O(N) fallback for non-registry mode - Safety validation for membership checking ## Cumulative Progress (8 boxes total) **Previously extracted** (Phase 1): 1. config_box (211 lines) 2. publish_box (419 lines) 3. globals_box (256 lines) 4. phase6_wrappers_box (122 lines) 5. ace_guard_box (100 lines) **This phase** (Phase 2): 6. tls_state_box (224 lines) 7. legacy_slow_box (96 lines) 8. slab_lookup_box (77 lines) **Total extracted**: 1,505 lines across 8 coherent modules **Remaining core**: 616 lines (well-organized, focused) ## Benefits - **Readability**: 2k monolith → focused 616-line core - **Maintainability**: Each box has single responsibility - **Organization**: TLS state, legacy code, lookup utilities separated - **Build**: All modules compile successfully ✅ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 01:23:59 +09:00
int count = HAKMEM_TINY_PREWARM_COUNT; // Default: 16 blocks per class
// Trigger refill to populate TLS cache
// P0 Fix: Use appropriate refill function based on P0 status
#if HAKMEM_TINY_P0_BATCH_REFILL
sll_refill_batch_from_ss(class_idx, count);
#else
sll_refill_small_from_ss(class_idx, count);
#endif
}
}
#endif
ENV Cleanup: Delete Ultra HEAP & BG Remote dead code (-1,096 LOC) Deleted files (11): - core/ultra/ directory (6 files: tiny_ultra_heap.*, tiny_ultra_page_arena.*) - core/front/tiny_ultrafront.h - core/tiny_ultra_fast.inc.h - core/hakmem_tiny_ultra_front.inc.h - core/hakmem_tiny_ultra_simple.inc - core/hakmem_tiny_ultra_batch_box.inc Edited files (10): - core/hakmem_tiny.c: Remove Ultra HEAP #includes, move ultra_batch_for_class() - core/hakmem_tiny_tls_state_box.inc: Delete TinyUltraFront, g_ultra_simple - core/hakmem_tiny_phase6_wrappers_box.inc: Delete ULTRA_SIMPLE block - core/hakmem_tiny_alloc.inc: Delete Ultra-Front code block - core/hakmem_tiny_init.inc: Delete ULTRA_SIMPLE ENV loading - core/hakmem_tiny_remote_target.{c,h}: Delete g_bg_remote_enable/batch - core/tiny_refill.h: Remove BG Remote check (always break) - core/hakmem_tiny_background.inc: Delete BG Remote drain loop Deleted ENV variables: - HAKMEM_TINY_ULTRA_HEAP (build flag, undefined) - HAKMEM_TINY_ULTRA_L0 - HAKMEM_TINY_ULTRA_HEAP_DUMP - HAKMEM_TINY_ULTRA_PAGE_DUMP - HAKMEM_TINY_ULTRA_FRONT - HAKMEM_TINY_BG_REMOTE (no getenv, dead code) - HAKMEM_TINY_BG_REMOTE_BATCH (no getenv, dead code) - HAKMEM_TINY_ULTRA_SIMPLE (references only) Impact: - Code reduction: -1,096 lines - Binary size: 305KB → 304KB (-1KB) - Build: PASS - Sanity: 15.69M ops/s (3 runs avg) - Larson: 1 crash observed (seed 43, likely existing instability) Notes: - Ultra HEAP never compiled (#if HAKMEM_TINY_ULTRA_HEAP undefined) - BG Remote variables never initialized (g_bg_remote_enable always 0) - Ultra SLIM (ultra_slim_alloc_box.h) preserved (active 4-layer path) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 04:35:47 +09:00
// Ultra-Simple front - REMOVED (dead code cleanup 2025-11-27)
Refactor: Extract 3 more Box modules from hakmem_tiny.c (-70% total reduction) Continue hakmem_tiny.c refactoring with 3 large module extractions. ## Changes **hakmem_tiny.c**: 995 → 616 lines (-379 lines, -38% this phase) **Total reduction**: 2081 → 616 lines (-1465 lines, -70% cumulative) 🏆 ## Extracted Modules (3 new boxes) 6. **tls_state_box** (224 lines) - TLS SLL enable flags and configuration - TLS canaries and SLL array definitions - Debug counters (path, ultra, allocation) - Frontend/backend configuration - TLS thread ID caching helpers - Frontend hit/miss counters - HotMag, QuickSlot, Ultra-front configuration - Helper functions (is_hot_class, tiny_optional_push) - Intelligence system helpers 7. **legacy_slow_box** (96 lines) - tiny_slow_alloc_fast() function (cold/unused) - Legacy slab-based allocation with refill - TLS cache/fast cache refill from slabs - Remote drain handling - List management (move to full/free lists) - Marked __attribute__((cold, noinline, unused)) 8. **slab_lookup_box** (77 lines) - registry_lookup() - O(1) hash-based lookup - hak_tiny_owner_slab() - public API for slab discovery - Linear probing search with atomic owner access - O(N) fallback for non-registry mode - Safety validation for membership checking ## Cumulative Progress (8 boxes total) **Previously extracted** (Phase 1): 1. config_box (211 lines) 2. publish_box (419 lines) 3. globals_box (256 lines) 4. phase6_wrappers_box (122 lines) 5. ace_guard_box (100 lines) **This phase** (Phase 2): 6. tls_state_box (224 lines) 7. legacy_slow_box (96 lines) 8. slab_lookup_box (77 lines) **Total extracted**: 1,505 lines across 8 coherent modules **Remaining core**: 616 lines (well-organized, focused) ## Benefits - **Readability**: 2k monolith → focused 616-line core - **Maintainability**: Each box has single responsibility - **Organization**: TLS state, legacy code, lookup utilities separated - **Build**: All modules compile successfully ✅ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 01:23:59 +09:00
// HotMag helpers (for classes 0..3)
static inline int is_hot_class(int class_idx) { return class_idx >= 0 && class_idx <= 3; }
ENV Cleanup: Delete Ultra HEAP & BG Remote dead code (-1,096 LOC) Deleted files (11): - core/ultra/ directory (6 files: tiny_ultra_heap.*, tiny_ultra_page_arena.*) - core/front/tiny_ultrafront.h - core/tiny_ultra_fast.inc.h - core/hakmem_tiny_ultra_front.inc.h - core/hakmem_tiny_ultra_simple.inc - core/hakmem_tiny_ultra_batch_box.inc Edited files (10): - core/hakmem_tiny.c: Remove Ultra HEAP #includes, move ultra_batch_for_class() - core/hakmem_tiny_tls_state_box.inc: Delete TinyUltraFront, g_ultra_simple - core/hakmem_tiny_phase6_wrappers_box.inc: Delete ULTRA_SIMPLE block - core/hakmem_tiny_alloc.inc: Delete Ultra-Front code block - core/hakmem_tiny_init.inc: Delete ULTRA_SIMPLE ENV loading - core/hakmem_tiny_remote_target.{c,h}: Delete g_bg_remote_enable/batch - core/tiny_refill.h: Remove BG Remote check (always break) - core/hakmem_tiny_background.inc: Delete BG Remote drain loop Deleted ENV variables: - HAKMEM_TINY_ULTRA_HEAP (build flag, undefined) - HAKMEM_TINY_ULTRA_L0 - HAKMEM_TINY_ULTRA_HEAP_DUMP - HAKMEM_TINY_ULTRA_PAGE_DUMP - HAKMEM_TINY_ULTRA_FRONT - HAKMEM_TINY_BG_REMOTE (no getenv, dead code) - HAKMEM_TINY_BG_REMOTE_BATCH (no getenv, dead code) - HAKMEM_TINY_ULTRA_SIMPLE (references only) Impact: - Code reduction: -1,096 lines - Binary size: 305KB → 304KB (-1KB) - Build: PASS - Sanity: 15.69M ops/s (3 runs avg) - Larson: 1 crash observed (seed 43, likely existing instability) Notes: - Ultra HEAP never compiled (#if HAKMEM_TINY_ULTRA_HEAP undefined) - BG Remote variables never initialized (g_bg_remote_enable always 0) - Ultra SLIM (ultra_slim_alloc_box.h) preserved (active 4-layer path) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 04:35:47 +09:00
// Optional front (HotMag) push helper: compile-out in release builds
Refactor: Extract 3 more Box modules from hakmem_tiny.c (-70% total reduction) Continue hakmem_tiny.c refactoring with 3 large module extractions. ## Changes **hakmem_tiny.c**: 995 → 616 lines (-379 lines, -38% this phase) **Total reduction**: 2081 → 616 lines (-1465 lines, -70% cumulative) 🏆 ## Extracted Modules (3 new boxes) 6. **tls_state_box** (224 lines) - TLS SLL enable flags and configuration - TLS canaries and SLL array definitions - Debug counters (path, ultra, allocation) - Frontend/backend configuration - TLS thread ID caching helpers - Frontend hit/miss counters - HotMag, QuickSlot, Ultra-front configuration - Helper functions (is_hot_class, tiny_optional_push) - Intelligence system helpers 7. **legacy_slow_box** (96 lines) - tiny_slow_alloc_fast() function (cold/unused) - Legacy slab-based allocation with refill - TLS cache/fast cache refill from slabs - Remote drain handling - List management (move to full/free lists) - Marked __attribute__((cold, noinline, unused)) 8. **slab_lookup_box** (77 lines) - registry_lookup() - O(1) hash-based lookup - hak_tiny_owner_slab() - public API for slab discovery - Linear probing search with atomic owner access - O(N) fallback for non-registry mode - Safety validation for membership checking ## Cumulative Progress (8 boxes total) **Previously extracted** (Phase 1): 1. config_box (211 lines) 2. publish_box (419 lines) 3. globals_box (256 lines) 4. phase6_wrappers_box (122 lines) 5. ace_guard_box (100 lines) **This phase** (Phase 2): 6. tls_state_box (224 lines) 7. legacy_slow_box (96 lines) 8. slab_lookup_box (77 lines) **Total extracted**: 1,505 lines across 8 coherent modules **Remaining core**: 616 lines (well-organized, focused) ## Benefits - **Readability**: 2k monolith → focused 616-line core - **Maintainability**: Each box has single responsibility - **Organization**: TLS state, legacy code, lookup utilities separated - **Build**: All modules compile successfully ✅ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-21 01:23:59 +09:00
static inline int tiny_optional_push(int class_idx, void* ptr) {
#if HAKMEM_BUILD_RELEASE
(void)class_idx;
(void)ptr;
return 0;
#else
if (__builtin_expect(is_hot_class(class_idx), 0)) {
if (__builtin_expect(hotmag_push(class_idx, ptr), 0)) {
return 1;
}
}
return 0;
#endif
}
// Phase 9.6: Deferred Intelligence (event queue + background)
// Extended event for FLINT Intelligence (lightweight; recorded off hot path only)
// Observability, ACE, and intelligence helpers