Commit Graph

4 Commits

Author SHA1 Message Date
cd3280eee7 Implement MADV_POPULATE_WRITE fix for SuperSlab allocation
Add support for MADV_POPULATE_WRITE (Linux 5.14+) to force page population
AFTER munmap trimming in SuperSlab fallback path.

Changes:
1. core/box/ss_os_acquire_box.c (lines 171-201):
   - Apply MADV_POPULATE_WRITE after munmap prefix/suffix trim
   - Fallback to explicit page touch for kernels < 5.14
   - Always cleanup suffix region (remove MADV_DONTNEED path)

2. core/superslab_cache.c (lines 111-121):
   - Use MADV_POPULATE_WRITE instead of memset for efficiency
   - Fallback to memset if madvise fails

Testing Results:
- Page faults: Unchanged (~145K per 1M ops)
- Throughput: -2% (4.18M → 4.10M ops/s with HAKMEM_SS_PREFAULT=1)
- Root cause: 97.6% of page faults are from libc memset in initialization,
  not from SuperSlab memory access

Conclusion: MADV_POPULATE_WRITE is effective for SuperSlab memory,
but overall page fault bottleneck comes from TLS/shared pool initialization.
Startup warmup remains the most effective solution (already implemented
in bench_random_mixed.c with +9.5% improvement).

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-05 10:42:47 +09:00
cba6f785a1 Add SuperSlab Prefault Box with 4MB MAP_POPULATE bug fix
New Feature: ss_prefault_box.h
- Box for controlling SuperSlab page prefaulting policy
- ENV: HAKMEM_SS_PREFAULT (0=OFF, 1=POPULATE, 2=TOUCH)
- Default: OFF (safe mode until further optimization)

Bug Fix: 4MB MAP_POPULATE regression
- Problem: Fallback path allocated 4MB (2x size for alignment) with MAP_POPULATE
  causing 52x slower mmap (0.585ms → 30.6ms) and 35% throughput regression
- Solution: Remove MAP_POPULATE from 4MB allocation, apply madvise(MADV_WILLNEED)
  only to the aligned 2MB region after trimming prefix/suffix

Changes:
- core/box/ss_prefault_box.h: New prefault policy box (header-only)
- core/box/ss_allocation_box.c: Integrate prefault box, call ss_prefault_region()
- core/superslab_cache.c: Fix fallback path - no MAP_POPULATE on 4MB,
  always munmap prefix/suffix, use MADV_WILLNEED for 2MB only
- docs/specs/ENV_VARS*.md: Document HAKMEM_SS_PREFAULT

Performance:
- bench_random_mixed: 4.32M ops/s (regression fixed, slight improvement)
- bench_tiny_hot: 157M ops/s with prefault=1 (no crash)

Box Theory:
- OS layer (ss_os_acquire): "how to mmap"
- Prefault Box: "when to page-in"
- Allocation Box: "when to call prefault"

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 20:11:24 +09:00
25cb7164c7 Comprehensive legacy cleanup and architecture consolidation
Summary of Changes:

MOVED TO ARCHIVE:
- core/hakmem_tiny_legacy_slow_box.inc → archive/
  * Slow path legacy code preserved for reference
  * Superseded by Gatekeeper Box architecture

- core/superslab_allocate.c → archive/superslab_allocate_legacy.c
  * Legacy SuperSlab allocation implementation
  * Functionality integrated into new Box system

- core/superslab_head.c → archive/superslab_head_legacy.c
  * Legacy slab head management
  * Refactored through Box architecture

REMOVED DEAD CODE:
- Eliminated unused allocation policy variants from ss_allocation_box.c
  * Reduced from 127+ lines of conditional logic to focused implementation
  * Removed: old policy branches, unused allocation strategies
  * Kept: current Box-based allocation path

ADDED NEW INFRASTRUCTURE:
- core/superslab_head_stub.c (41 lines)
  * Minimal stub for backward compatibility
  * Delegates to new architecture

- Enhanced core/superslab_cache.c (75 lines added)
  * Added missing API functions for cache management
  * Proper interface for SuperSlab cache integration

REFACTORED CORE SYSTEMS:
- core/hakmem_super_registry.c
  * Moved registration logic from scattered locations
  * Centralized SuperSlab registry management

- core/hakmem_tiny.c
  * Removed 27 lines of redundant initialization
  * Simplified through Box architecture

- core/hakmem_tiny_alloc.inc
  * Streamlined allocation path to use Gatekeeper
  * Removed legacy decision logic

- core/box/ss_allocation_box.c/h
  * Dramatically simplified allocation policy
  * Removed conditional branches for unused strategies
  * Focused on current Box-based approach

BUILD SYSTEM:
- Updated Makefile for archive structure
- Removed obsolete object file references
- Maintained build compatibility

SAFETY & TESTING:
- All deletions verified: no broken references
- Build verification: RELEASE=0 and RELEASE=1 pass
- Smoke tests: 100% pass rate
- Functional verification: allocation/free intact

Architecture Consolidation:
Before: Multiple overlapping allocation paths with legacy code branches
After:  Single unified path through Gatekeeper Boxes with clear architecture

Benefits:
- Reduced code size and complexity
- Improved maintainability
- Single source of truth for allocation logic
- Better diagnostic/observability hooks
- Foundation for future optimizations

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-04 14:22:48 +09:00
6ac6f5ae1b Refactor: Split hakmem_tiny_superslab.c + unified backend exit point
Major refactoring to improve maintainability and debugging:

1. Split hakmem_tiny_superslab.c (1521 lines) into 7 focused files:
   - superslab_allocate.c: SuperSlab allocation/deallocation
   - superslab_backend.c: Backend allocation paths (legacy, shared)
   - superslab_ace.c: ACE (Adaptive Cache Engine) logic
   - superslab_slab.c: Slab initialization and bitmap management
   - superslab_cache.c: LRU cache and prewarm cache management
   - superslab_head.c: SuperSlabHead management and expansion
   - superslab_stats.c: Statistics tracking and debugging

2. Created hakmem_tiny_superslab_internal.h for shared declarations

3. Added superslab_return_block() as single exit point for header writing:
   - All backend allocations now go through this helper
   - Prevents bugs where headers are forgotten in some paths
   - Makes future debugging easier

4. Updated Makefile for new file structure

5. Added header writing to ss_legacy_backend_box.c and
   ss_unified_backend_box.c (though not currently linked)

Note: Header corruption bug in Larson benchmark still exists.
Class 1-6 allocations go through TLS refill/carve paths, not backend.
Further investigation needed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 05:13:04 +09:00