Add Larson performance analysis and optimized profile

Ultrathink analysis reveals root cause of 4x performance gap:

Key Findings:
- Single-thread: HAKMEM 0.46M ops/s vs system 4.29M ops/s (10.7%)
- Multi-thread: HAKMEM 1.81M ops/s vs system 7.23M ops/s (25.0%)
- Root cause: malloc() entry point has 8+ branch checks
- Bottleneck: Fast Path is structurally complex vs system tcache

Files Added:
- LARSON_PERFORMANCE_ANALYSIS_2025_11_05.md: Detailed analysis with 3 optimization strategies
- scripts/profiles/tinyhot_optimized.env: CLAUDE.md-based optimized config

Proposed Solutions:
- Option A: Optimize malloc() guard checks (+200-400% expected)
- Option B: Improve refill efficiency (+30-50% expected)
- Option C: Complete Fast Path simplification (+400-800% expected)

Target: Achieve 60-80% of system malloc performance
This commit is contained in:
Claude
2025-11-05 04:03:10 +00:00
parent b4e4416544
commit f0c87d0cac
2 changed files with 372 additions and 0 deletions

View File

@ -0,0 +1,25 @@
# CLAUDE.md optimized settings for Larson
export HAKMEM_TINY_FAST_PATH=1
export HAKMEM_TINY_USE_SUPERSLAB=1
export HAKMEM_USE_SUPERSLAB=1
export HAKMEM_TINY_SS_ADOPT=1
export HAKMEM_WRAP_TINY=1
# Key optimizations from CLAUDE.md
export HAKMEM_TINY_FAST_CAP=16 # Reduced from 64
export HAKMEM_TINY_FAST_CAP_0=16
export HAKMEM_TINY_FAST_CAP_1=16
export HAKMEM_TINY_REFILL_COUNT_HOT=64
# Disable magazine layers
export HAKMEM_TINY_TLS_SLL=1
export HAKMEM_TINY_TLS_LIST=0
export HAKMEM_TINY_HOTMAG=0
# Debug OFF
export HAKMEM_TINY_TRACE_RING=0
export HAKMEM_SAFE_FREE=0
export HAKMEM_TINY_REMOTE_GUARD=0
export HAKMEM_DEBUG_COUNTERS=0
export HAKMEM_TINY_PHASE6_BOX_REFACTOR=1