Commit Graph

4 Commits

Author SHA1 Message Date
e7710982f8 Phase 2-Opt1: Force inline range check functions (neutral perf)
Changes:
- smallmid_is_in_range(): Add __attribute__((always_inline))
- mid_is_in_range(): Add __attribute__((always_inline))

Expected: Reduce function call overhead in Front Gate routing
Result: Neutral performance (~72M ops/s, same as Phase 1 final)

Analysis:
- Compiler was already inlining these simple functions with -O3 -flto
- 36M branches identified by perf are NOT from Front Gate routing
- Most branches are inside allocators (tiny_alloc, free, etc.)
- Front Gate optimization had minimal impact, as predicted

Next: SuperSlab size optimization (clear 3-5% benefit expected)

Files:
- core/hakmem_smallmid.h:116-119
- core/hakmem_mid_mt.h:228-231

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-28 18:14:31 +09:00
6818e350c4 Phase 16: Dynamic Tiny/Mid Boundary with A/B Testing (ENV-controlled)
IMPLEMENTATION:
===============
Add dynamic boundary adjustment between Tiny and Mid allocators via
HAKMEM_TINY_MAX_CLASS environment variable for performance tuning.

Changes:
--------
1. hakmem_tiny.h/c: Add tiny_get_max_size() - reads ENV and maps class
   to max usable size (default: class 7 = 1023B, can reduce to class 5 = 255B)

2. hakmem_mid_mt.h/c: Add mid_get_min_size() - returns tiny_get_max_size() + 1
   to ensure no size gap between allocators

3. hak_alloc_api.inc.h: Replace static TINY_MAX_SIZE with dynamic
   tiny_get_max_size() call in allocation routing logic

4. Size gap fix: Mid's range now dynamically adjusts based on Tiny's max
   (prevents 256-1023B from falling through when HAKMEM_TINY_MAX_CLASS=5)

A/B BENCHMARK RESULTS:
======================
Config A (Default, C0-C7, Tiny up to 1023B):
  128B:  6.34M ops/s  |  256B:  6.34M ops/s
  512B:  5.55M ops/s  |  1024B: 5.91M ops/s

Config B (Reduced, C0-C5, Tiny up to 255B):
  128B:  1.38M ops/s (-78%)  |  256B:  1.36M ops/s (-79%)
  512B:  1.33M ops/s (-76%)  |  1024B: 1.37M ops/s (-77%)

FINDINGS:
=========
 Size gap fixed - no OOM crashes with HAKMEM_TINY_MAX_CLASS=5
 Severe performance degradation (-76% to -79%) when reducing Tiny coverage
 Even 128B degraded (should still use Tiny) - possible class filtering issue
⚠️  Mid's coarse size classes (8KB/16KB/32KB) cause fragmentation for small sizes

HYPOTHESIS:
-----------
Mid allocator uses 8KB blocks for all 256-1024B allocations, causing:
- Severe internal fragmentation (1024B request → 8KB block = 87% waste)
- Poor cache utilization
- Consistent ~1.3M ops/s across all sizes (same 8KB class)

RECOMMENDATION:
===============
**Keep default HAKMEM_TINY_MAX_CLASS=7 (C0-C7, up to 1023B)**

Reducing Tiny coverage is COUNTERPRODUCTIVE with current Mid allocator design.
To make this viable, Mid would need finer size classes for 256B-8KB range.

ENV USAGE (for future experimentation):
----------------------------------------
export HAKMEM_TINY_MAX_CLASS=7  # Default (C0-C7, up to 1023B)
export HAKMEM_TINY_MAX_CLASS=5  # Reduced (C0-C5, up to 255B) - NOT recommended

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 01:26:48 +09:00
0d42913efe Fix 1KB-8KB allocation gap: Close Tiny/Mid boundary
Problem: 1024B allocations fell through to mmap (1000x slowdown)
- TINY_MAX_SIZE: 1023B (C7 usable size with 1-byte header)
- MID_MIN_SIZE:  8KB (was too large)
- Gap: 1KB-8KB → no allocator handled → mmap fallback → syscall hell

Solution: Lower MID_MIN_SIZE to 1KB (ChatGPT recommendation)
- Tiny: 0-1023B (header-based, C7 usable=1023B)
- Mid:  1KB-32KB (closes gap, uses 8KB class for sub-8KB sizes)
- Pool: 8KB-52KB (parallel, Pool takes priority)

Results (bench_fixed_size 1024B, workset=128, 200K iterations):
- Before: 82K ops/s (mmap flood: 1000+ syscalls/iter)
- After:  489K ops/s (Mid allocator: ~30 mmap total)
- Improvement: 6.0x faster 
- No hang: Completes in 0.4s (was timing out) 

Syscall reduction (1000 iterations):
- mmap:    1029 → 30   (-97%) 
- munmap:  1003 → 3    (-99%) 
- mincore: 1000 → 1000 (unchanged, separate issue)

Related: Phase 13-A (TinyHeapV2), workset=128 debug investigation

🤝 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-15 05:51:58 +09:00
52386401b3 Debug Counters Implementation - Clean History
Major Features:
- Debug counter infrastructure for Refill Stage tracking
- Free Pipeline counters (ss_local, ss_remote, tls_sll)
- Diagnostic counters for early return analysis
- Unified larson.sh benchmark runner with profiles
- Phase 6-3 regression analysis documentation

Bug Fixes:
- Fix SuperSlab disabled by default (HAKMEM_TINY_USE_SUPERSLAB)
- Fix profile variable naming consistency
- Add .gitignore patterns for large files

Performance:
- Phase 6-3: 4.79 M ops/s (has OOM risk)
- With SuperSlab: 3.13 M ops/s (+19% improvement)

This is a clean repository without large log files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-05 12:31:14 +09:00