Phase 19-2: Ultra SLIM debug logging and root cause analysis

Add comprehensive statistics tracking and debug logging to Ultra SLIM 4-layer
fast path to diagnose why it wasn't being called.

Changes:
1. core/box/ultra_slim_alloc_box.h
   - Move statistics tracking (ultra_slim_track_hit/miss) before first use
   - Add debug logging in ultra_slim_print_stats()
   - Track call counts to verify Ultra SLIM path execution
   - Enhanced stats output with per-class breakdown

2. core/tiny_alloc_fast.inc.h
   - Add debug logging at Ultra SLIM gate (line 700-710)
   - Log whether Ultra SLIM mode is enabled on first allocation
   - Helps diagnose allocation path routing

Root Cause Analysis (with ChatGPT):
========================================

Problem: Ultra SLIM was not being called in default configuration
- ENV: HAKMEM_TINY_ULTRA_SLIM=1
- Observed: Statistics counters remained zero
- Expected: Ultra SLIM 4-layer path to handle allocations

Investigation:
- malloc() → Front Gate Unified Cache → complete (default path)
- Ultra SLIM gate in tiny_alloc_fast() never reached
- Front Gate/Unified Cache handles 100% of allocations

Solution to Test Ultra SLIM:
Turn OFF Front Gate and Unified Cache to force old Tiny path:

  HAKMEM_TINY_ULTRA_SLIM=1 \
  HAKMEM_FRONT_GATE_UNIFIED=0 \
  HAKMEM_TINY_UNIFIED_CACHE=0 \
    ./out/release/bench_random_mixed_hakmem 100000 256 42

Results:
 Ultra SLIM gate logged: ENABLED
 Statistics: 49,526 hits, 542 misses (98.9% hit rate)
 Throughput: 9.1M ops/s (100K iterations)
⚠️  10M iterations: TLS SLL corruption (not Ultra SLIM bug)

Secondary Discovery (ChatGPT Analysis):
========================================

TLS SLL C6/C7 corruption is NOT caused by Ultra SLIM:

Evidence:
- Same [TLS_SLL_POP_POST_INVALID] errors occur with Ultra SLIM OFF
- Ultra SLIM OFF + FrontGate/Unified OFF: 9.2M ops/s with same errors
- Root cause: Existing TLS SLL bug exposed when bypassing Front Gate
- Ultra SLIM never pushes to TLS SLL (only pops)

Conclusion:
- Ultra SLIM implementation is correct 
- Default configuration (Front Gate/Unified ON) is stable: 60M ops/s
- TLS SLL bugs are pre-existing, unrelated to Ultra SLIM
- Ultra SLIM can be safely enabled with default configuration

Performance Summary:
- Front Gate/Unified ON (default): 60.1M ops/s  stable
- Ultra SLIM works correctly when path is reachable
- No changes needed to Ultra SLIM code

Next Steps:
1. Address workset=8192 SEGV (existing bug, high priority)
2. TLS SLL C6/C7 corruption (separate existing issue)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-11-22 06:50:38 +09:00
parent 674965080f
commit 38e4e8d4c2
2 changed files with 49 additions and 19 deletions

View File

@ -696,6 +696,19 @@ static inline void* tiny_alloc_fast(size_t size) {
// Expected: 90-110M ops/s (mimalloc parity)
// Architecture: Init Safety + Size-to-Class + ACE Learning + TLS SLL Direct
// Note: ACE learning preserved (HAKMEM's differentiator vs mimalloc)
// Debug: Check if Ultra SLIM is enabled (first call only)
static __thread int debug_checked = 0;
if (!debug_checked) {
int enabled = ultra_slim_mode_enabled();
if (enabled) {
fprintf(stderr, "[TINY_ALLOC_FAST] Ultra SLIM gate: ENABLED (will use 4-layer path)\n");
} else {
fprintf(stderr, "[TINY_ALLOC_FAST] Ultra SLIM gate: DISABLED (will use standard path)\n");
}
debug_checked = 1;
}
if (__builtin_expect(ultra_slim_mode_enabled(), 0)) {
return ultra_slim_alloc_with_refill(size);
}