093f362231
Add Page Box layer for C7 class optimization
...
- Implement tiny_page_box.c/h: per-thread page cache between UC and Shared Pool
- Integrate Page Box into Unified Cache refill path
- Remove legacy SuperSlab implementation (merged into smallmid)
- Add HAKMEM_TINY_PAGE_BOX_CLASSES env var for selective class enabling
- Update bench_random_mixed.c with Page Box statistics
Current status: Implementation safe, no regressions.
Page Box ON/OFF shows minimal difference - pool strategy needs tuning.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-12-05 15:31:44 +09:00
bd5e97f38a
Save current state before investigating TLS_SLL_HDR_RESET
2025-12-03 10:34:39 +09:00
4ef0171bc0
feat: Add ACE allocation failure tracing and debug hooks
...
This commit introduces a comprehensive tracing mechanism for allocation failures within the Adaptive Cache Engine (ACE) component. This feature allows for precise identification of the root cause for Out-Of-Memory (OOM) issues related to ACE allocations.
Key changes include:
- **ACE Tracing Implementation**:
- Added environment variable to enable/disable detailed logging of allocation failures.
- Instrumented , , and to distinguish between "Threshold" (size class mismatch), "Exhaustion" (pool depletion), and "MapFail" (OS memory allocation failure).
- **Build System Fixes**:
- Corrected to ensure is properly linked into , resolving an error.
- **LD_PRELOAD Wrapper Adjustments**:
- Investigated and understood the wrapper's behavior under , particularly its interaction with and checks.
- Enabled debugging flags for environment to prevent unintended fallbacks to 's for non-tiny allocations, allowing comprehensive testing of the allocator.
- **Debugging & Verification**:
- Introduced temporary verbose logging to pinpoint execution flow issues within interception and routing. These temporary logs have been removed.
- Created to facilitate testing of the tracing features.
This feature will significantly aid in diagnosing and resolving allocation-related OOM issues in by providing clear insights into the failure pathways.
2025-12-01 16:37:59 +09:00
e769dec283
Refactor: Clean up SuperSlab shared pool code
...
- Removed unused/disabled L0 cache implementation from core/hakmem_shared_pool.c.
- Deleted stale backup file core/hakmem_tiny_superslab.c.bak.
- Removed untracked and obsolete shared_pool source files.
2025-11-30 15:27:53 +09:00
20f8d6f179
Cleanup: Add tiny_debug_api.h to eliminate guard/failfast implicit warnings
...
Created central header for debug instrumentation API to fix implicit
function declaration warnings across the codebase.
Changes:
1. Created core/tiny_debug_api.h
- Declares guard system API (3 functions)
- Declares failfast debugging API (3 functions)
- Uses forward declarations for SuperSlab/TinySlabMeta
2. Updated 3 files to include tiny_debug_api.h:
- core/tiny_region_id.h (removed inline externs)
- core/hakmem_tiny_tls_ops.h
- core/tiny_superslab_alloc.inc.h
Warnings eliminated (6 of 11 total):
✅ tiny_guard_is_enabled()
✅ tiny_guard_on_alloc()
✅ tiny_guard_on_invalid()
✅ tiny_failfast_log()
✅ tiny_failfast_abort_ptr()
✅ tiny_refill_failfast_level()
Remaining warnings (deferred to P1):
- ss_active_add (2 occurrences)
- expand_superslab_head
- hkm_ace_set_tls_capacity
- smallmid_backend_free
Impact:
- Cleaner build output
- Better type safety for debug functions
- No behavior changes
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-29 06:47:13 +09:00
f8b0f38f78
ENV Cleanup Step 8: Gate HAKMEM_SUPER_LOOKUP_DEBUG in header
...
Gate HAKMEM_SUPER_LOOKUP_DEBUG environment variable behind
#if !HAKMEM_BUILD_RELEASE in hakmem_super_registry.h inline function.
Changes:
- Wrap s_dbg initialization in conditional compilation
- Release builds use constant s_dbg = 0 for complete elimination
- Debug logging in hak_super_lookup() now fully compiled out in release
Performance: 30.3M ops/s Larson (stable, no regression)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-28 01:45:45 +09:00
bcfb4f6b59
Remove dead code: UltraHot, RingCache, FrontC23, Class5 Hotpath
...
(cherry-picked from 225b6fcc7, conflicts resolved)
2025-11-26 12:33:49 +09:00
25d963a4aa
Code Cleanup: Remove false positives, redundant validations, and reduce verbose logging
...
Following the C7 stride upgrade fix (commit 23c0d9541 ), this commit performs
comprehensive cleanup to improve code quality and reduce debug noise.
## Changes
### 1. Disable False Positive Checks (tiny_nextptr.h)
- **Disabled**: NXT_MISALIGN validation block with `#if 0`
- **Reason**: Produces false positives due to slab base offsets (2048, 65536)
not being stride-aligned, causing all blocks to appear "misaligned"
- **TODO**: Reimplement to check stride DISTANCE between consecutive blocks
instead of absolute alignment to stride boundaries
### 2. Remove Redundant Geometry Validations
**hakmem_tiny_refill_p0.inc.h (P0 batch refill)**
- Removed 25-line CARVE_GEOMETRY_FIX validation block
- Replaced with NOTE explaining redundancy
- **Reason**: Stride table is now correct in tiny_block_stride_for_class(),
defense-in-depth validation adds overhead without benefit
**ss_legacy_backend_box.c (legacy backend)**
- Removed 18-line LEGACY_FIX_GEOMETRY validation block
- Replaced with NOTE explaining redundancy
- **Reason**: Shared_pool validates geometry at acquisition time
### 3. Reduce Verbose Logging
**hakmem_shared_pool.c (sp_fix_geometry_if_needed)**
- Made SP_FIX_GEOMETRY logging conditional on `!HAKMEM_BUILD_RELEASE`
- **Reason**: Geometry fixes are expected during stride upgrades,
no need to log in release builds
### 4. Verification
- Build: ✅ Successful (LTO warnings expected)
- Test: ✅ 10K iterations (1.87M ops/s, no crashes)
- NXT_MISALIGN false positives: ✅ Eliminated
## Files Modified
- core/tiny_nextptr.h - Disabled false positive NXT_MISALIGN check
- core/hakmem_tiny_refill_p0.inc.h - Removed redundant CARVE validation
- core/box/ss_legacy_backend_box.c - Removed redundant LEGACY validation
- core/hakmem_shared_pool.c - Made SP_FIX_GEOMETRY logging debug-only
## Impact
- **Code clarity**: Removed 43 lines of redundant validation code
- **Debug noise**: Reduced false positive diagnostics
- **Performance**: Eliminated overhead from redundant geometry checks
- **Maintainability**: Single source of truth for geometry validation
🧹 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-21 23:00:24 +09:00
8786d58fc8
Phase 17-2: Small-Mid Dedicated SuperSlab Backend (実験結果: 70% page fault, 性能改善なし)
...
Summary:
========
Phase 17-2 implements dedicated SuperSlab backend for Small-Mid allocator (256B-1KB).
Result: No performance improvement (-0.9%), worse than Phase 17-1 (+0.3%).
Root cause: 70% page fault (ChatGPT + perf profiling).
Conclusion: Small-Mid専用層戦略は失敗。Tiny SuperSlab最適化が必要。
Implementation:
===============
1. Dedicated Small-Mid SuperSlab pool (1MB, 16 slabs/SS)
- Separate from Tiny SuperSlab (no competition)
- Batch refill (8-16 blocks per TLS refill)
- Direct 0xb0 header writes (no Tiny delegation)
2. Backend architecture
- SmallMidSuperSlab: 1MB aligned region, fast ptr→SS lookup
- SmallMidSlabMeta: per-slab metadata (capacity/used/carved/freelist)
- SmallMidSSHead: per-class pool with LRU tracking
3. Batch refill implementation
- smallmid_refill_batch(): 8-16 blocks/call (vs 1 in Phase 17-1)
- Freelist priority → bump allocation fallback
- Auto SuperSlab expansion when exhausted
Files Added:
============
- core/hakmem_smallmid_superslab.h: SuperSlab metadata structures
- core/hakmem_smallmid_superslab.c: Backend implementation (~450 lines)
Files Modified:
===============
- core/hakmem_smallmid.c: Removed Tiny delegation, added batch refill
- Makefile: Added hakmem_smallmid_superslab.o to build
- CURRENT_TASK.md: Phase 17 完了記録 + Phase 18 計画
A/B Benchmark Results:
======================
| Size | Phase 17-1 (ON) | Phase 17-2 (ON) | Delta | vs Baseline |
|--------|-----------------|-----------------|----------|-------------|
| 256B | 6.06M ops/s | 5.84M ops/s | -3.6% | -4.1% |
| 512B | 5.91M ops/s | 5.86M ops/s | -0.8% | +1.2% |
| 1024B | 5.54M ops/s | 5.44M ops/s | -1.8% | +0.4% |
| Avg | 5.84M ops/s | 5.71M ops/s | -2.2% | -0.9% |
Performance Analysis (ChatGPT + perf):
======================================
✅ Frontend (TLS/batch refill): OK
- Only 30% CPU time
- Batch refill logic is efficient
- Direct 0xb0 header writes work correctly
❌ Backend (SuperSlab allocation): BOTTLENECK
- 70% CPU time in asm_exc_page_fault
- mmap(1MB) → kernel page allocation → very slow
- New SuperSlab allocation per benchmark run
- No warm SuperSlab reuse (used counter never decrements)
Root Cause:
===========
Small-Mid allocates new SuperSlabs frequently:
alloc → TLS miss → refill → new SuperSlab → mmap(1MB) → page fault (70%)
Tiny reuses warm SuperSlabs:
alloc → TLS miss → refill → existing warm SuperSlab → no page fault
Key Finding: "70% page fault" reveals SuperSlab layer needs optimization,
NOT frontend layer (TLS/batch refill design is correct).
Lessons Learned:
================
1. ❌ Small-Mid専用層戦略は失敗 (Phase 17-1: +0.3%, Phase 17-2: -0.9%)
2. ✅ Frontend実装は成功 (30% CPU, batch refill works)
3. 🔥 70% page fault = SuperSlab allocation bottleneck
4. ✅ Tiny (6.08M ops/s) is already well-optimized, hard to beat
5. ✅ Layer separation doesn't improve performance - backend optimization needed
Next Steps (Phase 18):
======================
ChatGPT recommendation: Optimize Tiny SuperSlab (NOT Small-Mid specific layer)
Box SS-Reuse (Priority 1):
- Implement meta->freelist reuse (currently bump-only)
- Detect slab empty → return to shared_pool
- Reuse same SuperSlab for longer (reduce page faults)
- Target: 70% page fault → 5-10%, 2-4x improvement
Box SS-Prewarm (Priority 2):
- Pre-allocate SuperSlabs per class (Phase 11: +6.4%)
- Concentrate page faults at benchmark start
- Benchmark-only optimization
Small-Mid Implementation Status:
=================================
- ENV=0 by default (zero overhead, branch predictor learns)
- Complete separation from Tiny (no interference)
- Valuable as experimental record ("why dedicated layer failed")
- Can be removed later if needed (not blocking Tiny optimization)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-16 03:21:13 +09:00
ccccabd944
Phase 17-1: Small-Mid Allocator - TLS Frontend Cache (結果: ±0.3%, 層分離成功)
...
Summary:
========
Phase 17-1 implements Small-Mid allocator as TLS frontend cache with Tiny backend delegation.
Result: Clean layer separation achieved with minimal overhead (±0.3%), but no performance gain.
Conclusion: Frontend-only approach is dead end. Phase 17-2 (dedicated backend) required for 2-3x target.
Implementation:
===============
1. Small-Mid TLS frontend (256B/512B/1KB - 3 classes)
- TLS freelist (32/24/16 capacity)
- Backend delegation to Tiny C5/C6/C7
- Header conversion (0xa0 → 0xb0)
2. Auto-adjust Tiny boundary
- When Small-Mid ON: Tiny auto-limits to C0-C5 (0-255B)
- When Small-Mid OFF: Tiny default C0-C7 (0-1023B)
- Prevents routing conflict
3. Routing order fix
- Small-Mid BEFORE Tiny (critical for proper execution)
- Fall-through on TLS miss
Files Modified:
===============
- core/hakmem_smallmid.h/c: TLS freelist + backend delegation
- core/hakmem_tiny.c: tiny_get_max_size() auto-adjust
- core/box/hak_alloc_api.inc.h: Routing order (Small-Mid → Tiny)
- CURRENT_TASK.md: Phase 17-1 results + Phase 17-2 plan
A/B Benchmark Results:
======================
| Size | Config A (OFF) | Config B (ON) | Delta | % Change |
|--------|----------------|---------------|----------|----------|
| 256B | 5.87M ops/s | 6.06M ops/s | +191K | +3.3% |
| 512B | 6.02M ops/s | 5.91M ops/s | -112K | -1.9% |
| 1024B | 5.58M ops/s | 5.54M ops/s | -35K | -0.6% |
| Overall| 5.82M ops/s | 5.84M ops/s | +20K | +0.3% |
Analysis:
=========
✅ SUCCESS: Clean layer separation (Small-Mid ↔ Tiny coexist)
✅ SUCCESS: Minimal overhead (±0.3% = measurement noise)
❌ FAIL: No performance gain (target was 2-4x)
Root Cause:
-----------
- Delegation overhead = TLS savings (net gain ≈ 0 instructions)
- Small-Mid TLS alloc: ~3-5 instructions
- Tiny backend delegation: ~3-5 instructions
- Header conversion: ~2 instructions
- No batching: 1:1 delegation to Tiny (no refill amortization)
Lessons Learned:
================
- Frontend-only approach ineffective (backend calls not reduced)
- Dedicated backend essential for meaningful improvement
- Clean separation achieved = solid foundation for Phase 17-2
Next Steps (Phase 17-2):
========================
- Dedicated Small-Mid SuperSlab backend (separate from Tiny)
- TLS batch refill (8-16 blocks per refill)
- Optimized 0xb0 header fast path (no delegation)
- Target: 12-15M ops/s (2.0-2.6x improvement)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-16 02:37:24 +09:00