Add Larson performance analysis and optimized profile
Ultrathink analysis reveals root cause of 4x performance gap: Key Findings: - Single-thread: HAKMEM 0.46M ops/s vs system 4.29M ops/s (10.7%) - Multi-thread: HAKMEM 1.81M ops/s vs system 7.23M ops/s (25.0%) - Root cause: malloc() entry point has 8+ branch checks - Bottleneck: Fast Path is structurally complex vs system tcache Files Added: - LARSON_PERFORMANCE_ANALYSIS_2025_11_05.md: Detailed analysis with 3 optimization strategies - scripts/profiles/tinyhot_optimized.env: CLAUDE.md-based optimized config Proposed Solutions: - Option A: Optimize malloc() guard checks (+200-400% expected) - Option B: Improve refill efficiency (+30-50% expected) - Option C: Complete Fast Path simplification (+400-800% expected) Target: Achieve 60-80% of system malloc performance
This commit is contained in:
25
scripts/profiles/tinyhot_optimized.env
Normal file
25
scripts/profiles/tinyhot_optimized.env
Normal file
@ -0,0 +1,25 @@
|
||||
# CLAUDE.md optimized settings for Larson
|
||||
export HAKMEM_TINY_FAST_PATH=1
|
||||
export HAKMEM_TINY_USE_SUPERSLAB=1
|
||||
export HAKMEM_USE_SUPERSLAB=1
|
||||
export HAKMEM_TINY_SS_ADOPT=1
|
||||
export HAKMEM_WRAP_TINY=1
|
||||
|
||||
# Key optimizations from CLAUDE.md
|
||||
export HAKMEM_TINY_FAST_CAP=16 # Reduced from 64
|
||||
export HAKMEM_TINY_FAST_CAP_0=16
|
||||
export HAKMEM_TINY_FAST_CAP_1=16
|
||||
export HAKMEM_TINY_REFILL_COUNT_HOT=64
|
||||
|
||||
# Disable magazine layers
|
||||
export HAKMEM_TINY_TLS_SLL=1
|
||||
export HAKMEM_TINY_TLS_LIST=0
|
||||
export HAKMEM_TINY_HOTMAG=0
|
||||
|
||||
# Debug OFF
|
||||
export HAKMEM_TINY_TRACE_RING=0
|
||||
export HAKMEM_SAFE_FREE=0
|
||||
export HAKMEM_TINY_REMOTE_GUARD=0
|
||||
export HAKMEM_DEBUG_COUNTERS=0
|
||||
|
||||
export HAKMEM_TINY_PHASE6_BOX_REFACTOR=1
|
||||
Reference in New Issue
Block a user