Files
hakmem/core/box/warm_pool_stats_box.h

54 lines
1.6 KiB
C
Raw Normal View History

Modularize Warm Pool with 3 Box Refactorings - Phase B-3a Complete Objective: Clean up warm pool implementation by extracting inline boxes for statistics, carving, and prefill logic. Achieved full modularity with zero performance regression using aggressive inline optimization. Changes: 1. **Legacy Code Removal** (Phase 0) - Removed unused static __thread prefill_attempt_count variable - Cleaned up duplicate comments - Simplified carve failure handling 2. **Warm Pool Statistics Box** (Phase 1) - New file: core/box/warm_pool_stats_box.h - Inline APIs: warm_pool_record_hit/miss/prefilled() - All statistics recording externalized - Integrated into unified_cache.c - Performance: 0 cost (inlined to direct memory write) 3. **Slab Carving Box** (Phase 2) - New file: core/box/slab_carve_box.h - Inline API: slab_carve_from_ss() - Extracted unified_cache_carve_from_ss() function - Now reusable by other refill paths (P0, etc.) - Performance: 100% inlined, O(slabs) scan unchanged 4. **Warm Pool Prefill Box** (Phase 3) - New file: core/box/warm_pool_prefill_box.h - Inline API: warm_pool_do_prefill() - Extracted prefill loop with configurable budget - WARM_POOL_PREFILL_BUDGET = 3 (tunable) - Cold path optimization (only on empty pool) - Performance: Cold path cost (non-critical) Architecture: - core/front/tiny_unified_cache.c now 40+ lines shorter - Logic distributed to 3 well-defined boxes - Each box has single responsibility (SRP) - Inline compilation preserves hot path performance - LTO (-flto) enables cross-file inlining Performance Results: - 1M allocations: 4.099M ops/s (maintained) - 5M allocations: 4.046M ops/s (maintained) - 55.6% warm pool hit rate (unchanged) - Zero regression on throughput - All three boxes fully inlined by compiler Code Quality Improvements: ✅ Removed legacy unused variables ✅ Separated concerns into specialized boxes ✅ Improved readability and maintainability ✅ Preserved performance via aggressive inline ✅ Enabled future reuse (carve box for P0) Testing: ✅ Compilation: No errors ✅ Functionality: 1M and 5M allocation tests pass ✅ Performance: Baseline maintained ✅ Statistics: Output identical to pre-refactor Next Phase: Consider similar modularization for: - Registry scanning (registry_scan_box.h) - TLS management (tls_management_box.h) - Cache operations (unified_cache_policy_box.h) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 23:39:02 +09:00
// warm_pool_stats_box.h - Warm Pool Statistics Box
// Purpose: Encapsulate warm pool statistics recording with inline APIs
// License: MIT
// Date: 2025-12-04
#ifndef HAK_WARM_POOL_STATS_BOX_H
#define HAK_WARM_POOL_STATS_BOX_H
#include <stdint.h>
#include "../hakmem_tiny_config.h"
#include "../front/tiny_warm_pool.h"
// ============================================================================
// External TLS Statistics (defined in core/front/tiny_unified_cache.c)
// ============================================================================
extern __thread TinyWarmPoolStats g_warm_pool_stats[TINY_NUM_CLASSES];
// ============================================================================
// Inline Statistics Recording API
// ============================================================================
// Record a warm pool hit
// Called when warm_pool_pop() succeeds and carve produces blocks
static inline void warm_pool_record_hit(int class_idx) {
Performance Optimization: Release Build Hygiene (Priority 1-4) Implement 4 targeted optimizations for release builds: 1. **Remove freelist validation from release builds** (Priority 1) - Guard registry lookup on every freelist node with #if !HAKMEM_BUILD_RELEASE - Expected gain: +15-20% throughput (eliminates 30-40% of refill cycles) - File: core/front/tiny_unified_cache.c:501-529 2. **Optimize PageFault telemetry** (Priority 2) - Already properly gated with HAKMEM_DEBUG_COUNTERS - No change needed (verified correct implementation) 3. **Make warm pool stats compile-time gated** (Priority 3) - Guard all stats recording with #if HAKMEM_DEBUG_COUNTERS - File: core/box/warm_pool_stats_box.h:25-51 4. **Reduce warm pool prefill lock overhead** (Priority 4) - Reduced WARM_POOL_PREFILL_BUDGET from 3 to 2 SuperSlabs - Balances prefill lock overhead with pool depletion frequency - File: core/box/warm_pool_prefill_box.h:28 5. **Disable debug counters by default in release builds** (Supporting) - Modified HAKMEM_DEBUG_COUNTERS to auto-detect based on NDEBUG - File: core/hakmem_build_flags.h:33-40 Benchmark Results (1M allocations, ws=256): - Before: 4.02-4.2M ops/s (with diagnostic overhead) - After: 4.04-4.2M ops/s (release build optimized) - Warm pool hit rate: Maintained at 55.6% - No performance regressions detected Expected Impact After Compilation: - With -DHAKMEM_BUILD_RELEASE=1 and -DNDEBUG: - Freelist validation: compiled out completely - Debug counters: compiled out completely - Telemetry: compiled out completely - Stats recording: compiled out (single (void) statement remains) - Expected +15-25% improvement in release builds 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-05 06:16:12 +09:00
#if HAKMEM_DEBUG_COUNTERS
Modularize Warm Pool with 3 Box Refactorings - Phase B-3a Complete Objective: Clean up warm pool implementation by extracting inline boxes for statistics, carving, and prefill logic. Achieved full modularity with zero performance regression using aggressive inline optimization. Changes: 1. **Legacy Code Removal** (Phase 0) - Removed unused static __thread prefill_attempt_count variable - Cleaned up duplicate comments - Simplified carve failure handling 2. **Warm Pool Statistics Box** (Phase 1) - New file: core/box/warm_pool_stats_box.h - Inline APIs: warm_pool_record_hit/miss/prefilled() - All statistics recording externalized - Integrated into unified_cache.c - Performance: 0 cost (inlined to direct memory write) 3. **Slab Carving Box** (Phase 2) - New file: core/box/slab_carve_box.h - Inline API: slab_carve_from_ss() - Extracted unified_cache_carve_from_ss() function - Now reusable by other refill paths (P0, etc.) - Performance: 100% inlined, O(slabs) scan unchanged 4. **Warm Pool Prefill Box** (Phase 3) - New file: core/box/warm_pool_prefill_box.h - Inline API: warm_pool_do_prefill() - Extracted prefill loop with configurable budget - WARM_POOL_PREFILL_BUDGET = 3 (tunable) - Cold path optimization (only on empty pool) - Performance: Cold path cost (non-critical) Architecture: - core/front/tiny_unified_cache.c now 40+ lines shorter - Logic distributed to 3 well-defined boxes - Each box has single responsibility (SRP) - Inline compilation preserves hot path performance - LTO (-flto) enables cross-file inlining Performance Results: - 1M allocations: 4.099M ops/s (maintained) - 5M allocations: 4.046M ops/s (maintained) - 55.6% warm pool hit rate (unchanged) - Zero regression on throughput - All three boxes fully inlined by compiler Code Quality Improvements: ✅ Removed legacy unused variables ✅ Separated concerns into specialized boxes ✅ Improved readability and maintainability ✅ Preserved performance via aggressive inline ✅ Enabled future reuse (carve box for P0) Testing: ✅ Compilation: No errors ✅ Functionality: 1M and 5M allocation tests pass ✅ Performance: Baseline maintained ✅ Statistics: Output identical to pre-refactor Next Phase: Consider similar modularization for: - Registry scanning (registry_scan_box.h) - TLS management (tls_management_box.h) - Cache operations (unified_cache_policy_box.h) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 23:39:02 +09:00
g_warm_pool_stats[class_idx].hits++;
Performance Optimization: Release Build Hygiene (Priority 1-4) Implement 4 targeted optimizations for release builds: 1. **Remove freelist validation from release builds** (Priority 1) - Guard registry lookup on every freelist node with #if !HAKMEM_BUILD_RELEASE - Expected gain: +15-20% throughput (eliminates 30-40% of refill cycles) - File: core/front/tiny_unified_cache.c:501-529 2. **Optimize PageFault telemetry** (Priority 2) - Already properly gated with HAKMEM_DEBUG_COUNTERS - No change needed (verified correct implementation) 3. **Make warm pool stats compile-time gated** (Priority 3) - Guard all stats recording with #if HAKMEM_DEBUG_COUNTERS - File: core/box/warm_pool_stats_box.h:25-51 4. **Reduce warm pool prefill lock overhead** (Priority 4) - Reduced WARM_POOL_PREFILL_BUDGET from 3 to 2 SuperSlabs - Balances prefill lock overhead with pool depletion frequency - File: core/box/warm_pool_prefill_box.h:28 5. **Disable debug counters by default in release builds** (Supporting) - Modified HAKMEM_DEBUG_COUNTERS to auto-detect based on NDEBUG - File: core/hakmem_build_flags.h:33-40 Benchmark Results (1M allocations, ws=256): - Before: 4.02-4.2M ops/s (with diagnostic overhead) - After: 4.04-4.2M ops/s (release build optimized) - Warm pool hit rate: Maintained at 55.6% - No performance regressions detected Expected Impact After Compilation: - With -DHAKMEM_BUILD_RELEASE=1 and -DNDEBUG: - Freelist validation: compiled out completely - Debug counters: compiled out completely - Telemetry: compiled out completely - Stats recording: compiled out (single (void) statement remains) - Expected +15-25% improvement in release builds 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-05 06:16:12 +09:00
#else
(void)class_idx;
#endif
Modularize Warm Pool with 3 Box Refactorings - Phase B-3a Complete Objective: Clean up warm pool implementation by extracting inline boxes for statistics, carving, and prefill logic. Achieved full modularity with zero performance regression using aggressive inline optimization. Changes: 1. **Legacy Code Removal** (Phase 0) - Removed unused static __thread prefill_attempt_count variable - Cleaned up duplicate comments - Simplified carve failure handling 2. **Warm Pool Statistics Box** (Phase 1) - New file: core/box/warm_pool_stats_box.h - Inline APIs: warm_pool_record_hit/miss/prefilled() - All statistics recording externalized - Integrated into unified_cache.c - Performance: 0 cost (inlined to direct memory write) 3. **Slab Carving Box** (Phase 2) - New file: core/box/slab_carve_box.h - Inline API: slab_carve_from_ss() - Extracted unified_cache_carve_from_ss() function - Now reusable by other refill paths (P0, etc.) - Performance: 100% inlined, O(slabs) scan unchanged 4. **Warm Pool Prefill Box** (Phase 3) - New file: core/box/warm_pool_prefill_box.h - Inline API: warm_pool_do_prefill() - Extracted prefill loop with configurable budget - WARM_POOL_PREFILL_BUDGET = 3 (tunable) - Cold path optimization (only on empty pool) - Performance: Cold path cost (non-critical) Architecture: - core/front/tiny_unified_cache.c now 40+ lines shorter - Logic distributed to 3 well-defined boxes - Each box has single responsibility (SRP) - Inline compilation preserves hot path performance - LTO (-flto) enables cross-file inlining Performance Results: - 1M allocations: 4.099M ops/s (maintained) - 5M allocations: 4.046M ops/s (maintained) - 55.6% warm pool hit rate (unchanged) - Zero regression on throughput - All three boxes fully inlined by compiler Code Quality Improvements: ✅ Removed legacy unused variables ✅ Separated concerns into specialized boxes ✅ Improved readability and maintainability ✅ Preserved performance via aggressive inline ✅ Enabled future reuse (carve box for P0) Testing: ✅ Compilation: No errors ✅ Functionality: 1M and 5M allocation tests pass ✅ Performance: Baseline maintained ✅ Statistics: Output identical to pre-refactor Next Phase: Consider similar modularization for: - Registry scanning (registry_scan_box.h) - TLS management (tls_management_box.h) - Cache operations (unified_cache_policy_box.h) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 23:39:02 +09:00
}
// Record a warm pool miss
// Called when warm_pool_pop() returns NULL (pool empty)
static inline void warm_pool_record_miss(int class_idx) {
Performance Optimization: Release Build Hygiene (Priority 1-4) Implement 4 targeted optimizations for release builds: 1. **Remove freelist validation from release builds** (Priority 1) - Guard registry lookup on every freelist node with #if !HAKMEM_BUILD_RELEASE - Expected gain: +15-20% throughput (eliminates 30-40% of refill cycles) - File: core/front/tiny_unified_cache.c:501-529 2. **Optimize PageFault telemetry** (Priority 2) - Already properly gated with HAKMEM_DEBUG_COUNTERS - No change needed (verified correct implementation) 3. **Make warm pool stats compile-time gated** (Priority 3) - Guard all stats recording with #if HAKMEM_DEBUG_COUNTERS - File: core/box/warm_pool_stats_box.h:25-51 4. **Reduce warm pool prefill lock overhead** (Priority 4) - Reduced WARM_POOL_PREFILL_BUDGET from 3 to 2 SuperSlabs - Balances prefill lock overhead with pool depletion frequency - File: core/box/warm_pool_prefill_box.h:28 5. **Disable debug counters by default in release builds** (Supporting) - Modified HAKMEM_DEBUG_COUNTERS to auto-detect based on NDEBUG - File: core/hakmem_build_flags.h:33-40 Benchmark Results (1M allocations, ws=256): - Before: 4.02-4.2M ops/s (with diagnostic overhead) - After: 4.04-4.2M ops/s (release build optimized) - Warm pool hit rate: Maintained at 55.6% - No performance regressions detected Expected Impact After Compilation: - With -DHAKMEM_BUILD_RELEASE=1 and -DNDEBUG: - Freelist validation: compiled out completely - Debug counters: compiled out completely - Telemetry: compiled out completely - Stats recording: compiled out (single (void) statement remains) - Expected +15-25% improvement in release builds 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-05 06:16:12 +09:00
#if HAKMEM_DEBUG_COUNTERS
Modularize Warm Pool with 3 Box Refactorings - Phase B-3a Complete Objective: Clean up warm pool implementation by extracting inline boxes for statistics, carving, and prefill logic. Achieved full modularity with zero performance regression using aggressive inline optimization. Changes: 1. **Legacy Code Removal** (Phase 0) - Removed unused static __thread prefill_attempt_count variable - Cleaned up duplicate comments - Simplified carve failure handling 2. **Warm Pool Statistics Box** (Phase 1) - New file: core/box/warm_pool_stats_box.h - Inline APIs: warm_pool_record_hit/miss/prefilled() - All statistics recording externalized - Integrated into unified_cache.c - Performance: 0 cost (inlined to direct memory write) 3. **Slab Carving Box** (Phase 2) - New file: core/box/slab_carve_box.h - Inline API: slab_carve_from_ss() - Extracted unified_cache_carve_from_ss() function - Now reusable by other refill paths (P0, etc.) - Performance: 100% inlined, O(slabs) scan unchanged 4. **Warm Pool Prefill Box** (Phase 3) - New file: core/box/warm_pool_prefill_box.h - Inline API: warm_pool_do_prefill() - Extracted prefill loop with configurable budget - WARM_POOL_PREFILL_BUDGET = 3 (tunable) - Cold path optimization (only on empty pool) - Performance: Cold path cost (non-critical) Architecture: - core/front/tiny_unified_cache.c now 40+ lines shorter - Logic distributed to 3 well-defined boxes - Each box has single responsibility (SRP) - Inline compilation preserves hot path performance - LTO (-flto) enables cross-file inlining Performance Results: - 1M allocations: 4.099M ops/s (maintained) - 5M allocations: 4.046M ops/s (maintained) - 55.6% warm pool hit rate (unchanged) - Zero regression on throughput - All three boxes fully inlined by compiler Code Quality Improvements: ✅ Removed legacy unused variables ✅ Separated concerns into specialized boxes ✅ Improved readability and maintainability ✅ Preserved performance via aggressive inline ✅ Enabled future reuse (carve box for P0) Testing: ✅ Compilation: No errors ✅ Functionality: 1M and 5M allocation tests pass ✅ Performance: Baseline maintained ✅ Statistics: Output identical to pre-refactor Next Phase: Consider similar modularization for: - Registry scanning (registry_scan_box.h) - TLS management (tls_management_box.h) - Cache operations (unified_cache_policy_box.h) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 23:39:02 +09:00
g_warm_pool_stats[class_idx].misses++;
Performance Optimization: Release Build Hygiene (Priority 1-4) Implement 4 targeted optimizations for release builds: 1. **Remove freelist validation from release builds** (Priority 1) - Guard registry lookup on every freelist node with #if !HAKMEM_BUILD_RELEASE - Expected gain: +15-20% throughput (eliminates 30-40% of refill cycles) - File: core/front/tiny_unified_cache.c:501-529 2. **Optimize PageFault telemetry** (Priority 2) - Already properly gated with HAKMEM_DEBUG_COUNTERS - No change needed (verified correct implementation) 3. **Make warm pool stats compile-time gated** (Priority 3) - Guard all stats recording with #if HAKMEM_DEBUG_COUNTERS - File: core/box/warm_pool_stats_box.h:25-51 4. **Reduce warm pool prefill lock overhead** (Priority 4) - Reduced WARM_POOL_PREFILL_BUDGET from 3 to 2 SuperSlabs - Balances prefill lock overhead with pool depletion frequency - File: core/box/warm_pool_prefill_box.h:28 5. **Disable debug counters by default in release builds** (Supporting) - Modified HAKMEM_DEBUG_COUNTERS to auto-detect based on NDEBUG - File: core/hakmem_build_flags.h:33-40 Benchmark Results (1M allocations, ws=256): - Before: 4.02-4.2M ops/s (with diagnostic overhead) - After: 4.04-4.2M ops/s (release build optimized) - Warm pool hit rate: Maintained at 55.6% - No performance regressions detected Expected Impact After Compilation: - With -DHAKMEM_BUILD_RELEASE=1 and -DNDEBUG: - Freelist validation: compiled out completely - Debug counters: compiled out completely - Telemetry: compiled out completely - Stats recording: compiled out (single (void) statement remains) - Expected +15-25% improvement in release builds 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-05 06:16:12 +09:00
#else
(void)class_idx;
#endif
Modularize Warm Pool with 3 Box Refactorings - Phase B-3a Complete Objective: Clean up warm pool implementation by extracting inline boxes for statistics, carving, and prefill logic. Achieved full modularity with zero performance regression using aggressive inline optimization. Changes: 1. **Legacy Code Removal** (Phase 0) - Removed unused static __thread prefill_attempt_count variable - Cleaned up duplicate comments - Simplified carve failure handling 2. **Warm Pool Statistics Box** (Phase 1) - New file: core/box/warm_pool_stats_box.h - Inline APIs: warm_pool_record_hit/miss/prefilled() - All statistics recording externalized - Integrated into unified_cache.c - Performance: 0 cost (inlined to direct memory write) 3. **Slab Carving Box** (Phase 2) - New file: core/box/slab_carve_box.h - Inline API: slab_carve_from_ss() - Extracted unified_cache_carve_from_ss() function - Now reusable by other refill paths (P0, etc.) - Performance: 100% inlined, O(slabs) scan unchanged 4. **Warm Pool Prefill Box** (Phase 3) - New file: core/box/warm_pool_prefill_box.h - Inline API: warm_pool_do_prefill() - Extracted prefill loop with configurable budget - WARM_POOL_PREFILL_BUDGET = 3 (tunable) - Cold path optimization (only on empty pool) - Performance: Cold path cost (non-critical) Architecture: - core/front/tiny_unified_cache.c now 40+ lines shorter - Logic distributed to 3 well-defined boxes - Each box has single responsibility (SRP) - Inline compilation preserves hot path performance - LTO (-flto) enables cross-file inlining Performance Results: - 1M allocations: 4.099M ops/s (maintained) - 5M allocations: 4.046M ops/s (maintained) - 55.6% warm pool hit rate (unchanged) - Zero regression on throughput - All three boxes fully inlined by compiler Code Quality Improvements: ✅ Removed legacy unused variables ✅ Separated concerns into specialized boxes ✅ Improved readability and maintainability ✅ Preserved performance via aggressive inline ✅ Enabled future reuse (carve box for P0) Testing: ✅ Compilation: No errors ✅ Functionality: 1M and 5M allocation tests pass ✅ Performance: Baseline maintained ✅ Statistics: Output identical to pre-refactor Next Phase: Consider similar modularization for: - Registry scanning (registry_scan_box.h) - TLS management (tls_management_box.h) - Cache operations (unified_cache_policy_box.h) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 23:39:02 +09:00
}
// Record a warm pool prefill event
// Called when pool is empty and we do secondary prefill
static inline void warm_pool_record_prefilled(int class_idx) {
Performance Optimization: Release Build Hygiene (Priority 1-4) Implement 4 targeted optimizations for release builds: 1. **Remove freelist validation from release builds** (Priority 1) - Guard registry lookup on every freelist node with #if !HAKMEM_BUILD_RELEASE - Expected gain: +15-20% throughput (eliminates 30-40% of refill cycles) - File: core/front/tiny_unified_cache.c:501-529 2. **Optimize PageFault telemetry** (Priority 2) - Already properly gated with HAKMEM_DEBUG_COUNTERS - No change needed (verified correct implementation) 3. **Make warm pool stats compile-time gated** (Priority 3) - Guard all stats recording with #if HAKMEM_DEBUG_COUNTERS - File: core/box/warm_pool_stats_box.h:25-51 4. **Reduce warm pool prefill lock overhead** (Priority 4) - Reduced WARM_POOL_PREFILL_BUDGET from 3 to 2 SuperSlabs - Balances prefill lock overhead with pool depletion frequency - File: core/box/warm_pool_prefill_box.h:28 5. **Disable debug counters by default in release builds** (Supporting) - Modified HAKMEM_DEBUG_COUNTERS to auto-detect based on NDEBUG - File: core/hakmem_build_flags.h:33-40 Benchmark Results (1M allocations, ws=256): - Before: 4.02-4.2M ops/s (with diagnostic overhead) - After: 4.04-4.2M ops/s (release build optimized) - Warm pool hit rate: Maintained at 55.6% - No performance regressions detected Expected Impact After Compilation: - With -DHAKMEM_BUILD_RELEASE=1 and -DNDEBUG: - Freelist validation: compiled out completely - Debug counters: compiled out completely - Telemetry: compiled out completely - Stats recording: compiled out (single (void) statement remains) - Expected +15-25% improvement in release builds 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-05 06:16:12 +09:00
#if HAKMEM_DEBUG_COUNTERS
Modularize Warm Pool with 3 Box Refactorings - Phase B-3a Complete Objective: Clean up warm pool implementation by extracting inline boxes for statistics, carving, and prefill logic. Achieved full modularity with zero performance regression using aggressive inline optimization. Changes: 1. **Legacy Code Removal** (Phase 0) - Removed unused static __thread prefill_attempt_count variable - Cleaned up duplicate comments - Simplified carve failure handling 2. **Warm Pool Statistics Box** (Phase 1) - New file: core/box/warm_pool_stats_box.h - Inline APIs: warm_pool_record_hit/miss/prefilled() - All statistics recording externalized - Integrated into unified_cache.c - Performance: 0 cost (inlined to direct memory write) 3. **Slab Carving Box** (Phase 2) - New file: core/box/slab_carve_box.h - Inline API: slab_carve_from_ss() - Extracted unified_cache_carve_from_ss() function - Now reusable by other refill paths (P0, etc.) - Performance: 100% inlined, O(slabs) scan unchanged 4. **Warm Pool Prefill Box** (Phase 3) - New file: core/box/warm_pool_prefill_box.h - Inline API: warm_pool_do_prefill() - Extracted prefill loop with configurable budget - WARM_POOL_PREFILL_BUDGET = 3 (tunable) - Cold path optimization (only on empty pool) - Performance: Cold path cost (non-critical) Architecture: - core/front/tiny_unified_cache.c now 40+ lines shorter - Logic distributed to 3 well-defined boxes - Each box has single responsibility (SRP) - Inline compilation preserves hot path performance - LTO (-flto) enables cross-file inlining Performance Results: - 1M allocations: 4.099M ops/s (maintained) - 5M allocations: 4.046M ops/s (maintained) - 55.6% warm pool hit rate (unchanged) - Zero regression on throughput - All three boxes fully inlined by compiler Code Quality Improvements: ✅ Removed legacy unused variables ✅ Separated concerns into specialized boxes ✅ Improved readability and maintainability ✅ Preserved performance via aggressive inline ✅ Enabled future reuse (carve box for P0) Testing: ✅ Compilation: No errors ✅ Functionality: 1M and 5M allocation tests pass ✅ Performance: Baseline maintained ✅ Statistics: Output identical to pre-refactor Next Phase: Consider similar modularization for: - Registry scanning (registry_scan_box.h) - TLS management (tls_management_box.h) - Cache operations (unified_cache_policy_box.h) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 23:39:02 +09:00
g_warm_pool_stats[class_idx].prefilled++;
Performance Optimization: Release Build Hygiene (Priority 1-4) Implement 4 targeted optimizations for release builds: 1. **Remove freelist validation from release builds** (Priority 1) - Guard registry lookup on every freelist node with #if !HAKMEM_BUILD_RELEASE - Expected gain: +15-20% throughput (eliminates 30-40% of refill cycles) - File: core/front/tiny_unified_cache.c:501-529 2. **Optimize PageFault telemetry** (Priority 2) - Already properly gated with HAKMEM_DEBUG_COUNTERS - No change needed (verified correct implementation) 3. **Make warm pool stats compile-time gated** (Priority 3) - Guard all stats recording with #if HAKMEM_DEBUG_COUNTERS - File: core/box/warm_pool_stats_box.h:25-51 4. **Reduce warm pool prefill lock overhead** (Priority 4) - Reduced WARM_POOL_PREFILL_BUDGET from 3 to 2 SuperSlabs - Balances prefill lock overhead with pool depletion frequency - File: core/box/warm_pool_prefill_box.h:28 5. **Disable debug counters by default in release builds** (Supporting) - Modified HAKMEM_DEBUG_COUNTERS to auto-detect based on NDEBUG - File: core/hakmem_build_flags.h:33-40 Benchmark Results (1M allocations, ws=256): - Before: 4.02-4.2M ops/s (with diagnostic overhead) - After: 4.04-4.2M ops/s (release build optimized) - Warm pool hit rate: Maintained at 55.6% - No performance regressions detected Expected Impact After Compilation: - With -DHAKMEM_BUILD_RELEASE=1 and -DNDEBUG: - Freelist validation: compiled out completely - Debug counters: compiled out completely - Telemetry: compiled out completely - Stats recording: compiled out (single (void) statement remains) - Expected +15-25% improvement in release builds 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-05 06:16:12 +09:00
#else
(void)class_idx;
#endif
Modularize Warm Pool with 3 Box Refactorings - Phase B-3a Complete Objective: Clean up warm pool implementation by extracting inline boxes for statistics, carving, and prefill logic. Achieved full modularity with zero performance regression using aggressive inline optimization. Changes: 1. **Legacy Code Removal** (Phase 0) - Removed unused static __thread prefill_attempt_count variable - Cleaned up duplicate comments - Simplified carve failure handling 2. **Warm Pool Statistics Box** (Phase 1) - New file: core/box/warm_pool_stats_box.h - Inline APIs: warm_pool_record_hit/miss/prefilled() - All statistics recording externalized - Integrated into unified_cache.c - Performance: 0 cost (inlined to direct memory write) 3. **Slab Carving Box** (Phase 2) - New file: core/box/slab_carve_box.h - Inline API: slab_carve_from_ss() - Extracted unified_cache_carve_from_ss() function - Now reusable by other refill paths (P0, etc.) - Performance: 100% inlined, O(slabs) scan unchanged 4. **Warm Pool Prefill Box** (Phase 3) - New file: core/box/warm_pool_prefill_box.h - Inline API: warm_pool_do_prefill() - Extracted prefill loop with configurable budget - WARM_POOL_PREFILL_BUDGET = 3 (tunable) - Cold path optimization (only on empty pool) - Performance: Cold path cost (non-critical) Architecture: - core/front/tiny_unified_cache.c now 40+ lines shorter - Logic distributed to 3 well-defined boxes - Each box has single responsibility (SRP) - Inline compilation preserves hot path performance - LTO (-flto) enables cross-file inlining Performance Results: - 1M allocations: 4.099M ops/s (maintained) - 5M allocations: 4.046M ops/s (maintained) - 55.6% warm pool hit rate (unchanged) - Zero regression on throughput - All three boxes fully inlined by compiler Code Quality Improvements: ✅ Removed legacy unused variables ✅ Separated concerns into specialized boxes ✅ Improved readability and maintainability ✅ Preserved performance via aggressive inline ✅ Enabled future reuse (carve box for P0) Testing: ✅ Compilation: No errors ✅ Functionality: 1M and 5M allocation tests pass ✅ Performance: Baseline maintained ✅ Statistics: Output identical to pre-refactor Next Phase: Consider similar modularization for: - Registry scanning (registry_scan_box.h) - TLS management (tls_management_box.h) - Cache operations (unified_cache_policy_box.h) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 23:39:02 +09:00
}
#endif // HAK_WARM_POOL_STATS_BOX_H