|
|
030132f911
|
Phase 10: TLS/SFC aggressive cache tuning (syscall reduction failed)
Goal: Reduce backend transitions by increasing frontend hit rate
Result: +2% best case, syscalls unchanged (root cause: SuperSlab churn)
Implementation:
1. Cache capacity expansion (2-8x per-class)
- Hot classes (C0-C3): 4x increase (512 slots)
- Medium classes (C4-C6): 2-3x increase
- Class 7 (1KB): 2x increase (128 slots)
- Fast cache: 2x default capacity
2. Refill batch size increase (4-8x)
- Global default: 16 → 64 (4x)
- Hot classes: 128 (8x) via HAKMEM_TINY_REFILL_COUNT_HOT
- Mid classes: 96 (6x) via HAKMEM_TINY_REFILL_COUNT_MID
- Class 7: 64 → 128 (2x)
- SFC refill: 64 → 128 (2x)
3. Adaptive sizing aggressive parameters
- Grow threshold: 80% → 70% (expand earlier)
- Shrink threshold: 20% → 10% (shrink less)
- Growth rate: 2x → 1.5x (smoother growth)
- Max capacity: 2048 → 4096 (2x ceiling)
- Adapt frequency: Every 10 → 5 refills (more responsive)
Performance Results (100K iterations):
Before (Phase 9):
- Performance: 9.71M ops/s
- Syscalls: 1,729 (mmap:877, munmap:852)
After (Phase 10):
- Default settings: 8.77M ops/s (-9.7%) ⚠️
- Optimal ENV: 9.89M ops/s (+2%) ✅
- Syscalls: 1,729 (unchanged) ❌
Optimal ENV configuration:
export HAKMEM_TINY_REFILL_COUNT_HOT=256
export HAKMEM_TINY_REFILL_COUNT_MID=192
Root Cause Analysis:
Bottleneck is NOT TLS/SFC hit rate, but SuperSlab allocation churn:
- 877 SuperSlabs allocated (877MB via mmap)
- Phase 9 LRU cache not utilized (no frees during benchmark)
- All SuperSlabs retained until program exit
- System malloc: 9 syscalls vs HAKMEM: 1,729 syscalls (192x gap)
Conclusion:
TLS/SFC tuning cannot solve SuperSlab allocation policy problem.
Next step: Phase 11 SuperSlab Prewarm strategy to eliminate
mmap/munmap during benchmark execution.
ChatGPT review: Strategy validated, Option A (Prewarm) recommended.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
|
2025-11-13 14:25:54 +09:00 |
|
|
|
707056b765
|
feat: Phase 7 + Phase 2 - Massive performance & stability improvements
Performance Achievements:
- Tiny allocations: +180-280% (21M → 59-70M ops/s random mixed)
- Single-thread: +24% (2.71M → 3.36M ops/s Larson)
- 4T stability: 0% → 95% (19/20 success rate)
- Overall: 91.3% of System malloc average (target was 40-55%) ✓
Phase 7 (Tasks 1-3): Core Optimizations
- Task 1: Header validation removal (Region-ID direct lookup)
- Task 2: Aggressive inline (TLS cache access optimization)
- Task 3: Pre-warm TLS cache (eliminate cold-start penalty)
Result: +180-280% improvement, 85-146% of System malloc
Critical Bug Fixes:
- Fix 64B allocation crash (size-to-class +1 for header)
- Fix 4T wrapper recursion bugs (BUG #7, #8, #10, #11)
- Remove malloc fallback (30% → 50% stability)
Phase 2a: SuperSlab Dynamic Expansion (CRITICAL)
- Implement mimalloc-style chunk linking
- Unlimited slab expansion (no more OOM at 32 slabs)
- Fix chunk initialization bug (bitmap=0x00000001 after expansion)
Files: core/hakmem_tiny_superslab.c/h, core/superslab/superslab_types.h
Result: 50% → 95% stability (19/20 4T success)
Phase 2b: TLS Cache Adaptive Sizing
- Dynamic capacity: 16-2048 slots based on usage
- High-water mark tracking + exponential growth/shrink
- Expected: +3-10% performance, -30-50% memory
Files: core/tiny_adaptive_sizing.c/h (new)
Phase 2c: BigCache Dynamic Hash Table
- Migrate from fixed 256×8 array to dynamic hash table
- Auto-resize: 256 → 512 → 1024 → 65,536 buckets
- Improved hash function (FNV-1a) + collision chaining
Files: core/hakmem_bigcache.c/h
Expected: +10-20% cache hit rate
Design Flaws Analysis:
- Identified 6 components with fixed-capacity bottlenecks
- SuperSlab (CRITICAL), TLS Cache (HIGH), BigCache/L2.5 (MEDIUM)
- Report: DESIGN_FLAWS_ANALYSIS.md (11 chapters)
Documentation:
- 13 comprehensive reports (PHASE*.md, DESIGN_FLAWS*.md)
- Implementation guides, test results, production readiness
- Bug fix reports, root cause analysis
Build System:
- Makefile: phase7 targets, PREWARM_TLS flag
- Auto dependency generation (-MMD -MP) for .inc files
Known Issues:
- 4T stability: 19/20 (95%) - investigating 1 failure for 100%
- L2.5 Pool dynamic sharding: design only (needs 2-3 days integration)
🤖 Generated with Claude Code (https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
|
2025-11-08 17:08:00 +09:00 |
|