Files
hakmem/core/box
Moe Charm (CI) 490b1c132a Phase 7-Step1: Unified front path branch hint reversal (+54.2% improvement!)
Performance Results (bench_random_mixed, ws=256):
- Before: 52.3 M ops/s (Phase 5/6 baseline)
- After:  80.6 M ops/s (+54.2% improvement, +28.3M ops/s)

Implementation:
- Changed __builtin_expect(TINY_FRONT_UNIFIED_GATE_ENABLED, 0) → (..., 1)
- Applied to BOTH malloc and free paths
- Lines changed: 137 (malloc), 190 (free)

Root Cause (from ChatGPT + Task agent analysis):
- Unified fast path existed but was marked UNLIKELY (hint = 0)
- Compiler optimized for legacy path, not unified cache path
- malloc/free consumed 43% CPU due to branch misprediction
- Reversing hint: unified path now primary, legacy path fallback

Impact Analysis:
- Tiny allocations now hit malloc_tiny_fast() → Unified Cache → SuperSlab
- Legacy layers (FastCache/SFC/HeapV2/TLS SLL) still exist but cold
- Next step: Compile-time elimination of legacy paths (Step 2)

Code Changes:
- core/box/hak_wrappers.inc.h:137 (malloc path)
- core/box/hak_wrappers.inc.h:190 (free path)
- Total: 2 lines changed (4 lines including comments)

Why This Works:
- CPU branch predictor now expects unified path
- Cache locality improved (unified path hot, legacy path cold)
- Instruction cache pressure reduced (hot path smaller)

Next Steps (ChatGPT recommendations):
1.  free side hint reversal (DONE - already applied)
2. ⏸️ Compile-time unified ON fixed (Step 2)
3. ⏸️ Document Phase 7 results (Step 3)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 16:17:34 +09:00
..