|
|
d54893ea1d
|
Phase 3 C3: Static Routing A/B Test ADOPT (+2.20% Mixed gain)
Step 2 & 3 Complete:
- A/B test (Mixed 10-run): STATIC_ROUTE=0 (38.91M) → =1 (39.77M) = +2.20% avg
- Median gain: +1.98%
- Result: ✅ GO (exceeds +1.0% threshold)
- Decision: ✅ ADOPT into MIXED_TINYV3_C7_SAFE preset
- bench_profile.h line 77: HAKMEM_TINY_STATIC_ROUTE=1 default
- Learner auto-disables static route when HAKMEM_SMALL_LEARNER_V7_ENABLED=1
Implementation Summary:
- core/box/tiny_static_route_box.{h,c}: Research box (Step 1A)
- core/front/malloc_tiny_fast.h: Route lookup integration (Step 1B, lines 249-256)
- core/bench_profile.h: Bench sync + preset adoption
Cumulative Phase 2-3 Gains:
- B3 (Routing shape): +2.89%
- B4 (Wrapper split): +1.47%
- C3 (Static routing): +2.20%
- Total: ~6.8% (35.2M → ~39.8M ops/s)
Next: Phase 3 C1 (TLS Prefetch, expected +2-4%)
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|
2025-12-13 18:46:11 +09:00 |
|
|
|
0feeccdcef
|
Phase 2 B1/B3/B4 preparation: Analysis & ENV gate setup
## Phase 2 Optimization Research Complete
### B1 (Header tax reduction v2) - NO-GO
- HAKMEM_TINY_HEADER_MODE=LIGHT: -2.54% regression on Mixed
- Decision: FREEZE as research box (ENV opt-in only)
### B3 (Routing branch shape optimization) - ADOPT
- Mixed: +2.89% (48.41M → 49.80M ops/s)
- C6-heavy: +9.13% (8.97M → 9.79M ops/s)
- Strategy: LIKELY on LEGACY (hot), noinline,cold helper for rare routes
- Implementation: Already in malloc_tiny_fast.h:252-267
- Profile updates: HAKMEM_TINY_ALLOC_ROUTE_SHAPE=1 now default
### B4 (Wrapper Layer Hot/Cold Split) - Preparation
- Design memo: docs/analysis/PHASE2_B4_WRAPPER_SHAPE_1_DESIGN.md
- Goal: Split malloc/free into hot/cold paths, reduce I-cache pressure
- ENV gate: HAKMEM_WRAP_SHAPE=0/1 (added to wrapper_env_box)
- Expected gain: +2-5% Mixed, +1-3% C6-heavy
## Analysis Summary
- Background is visible: FREE DUALHOT + B3 routing optimizations work
- Code layering is clean: winning boxes promoted to presets, losing boxes frozen with ENV guards
- Remaining gap to mimalloc is wrapper layer + safety checks + policy snapshot
- Further +5-10% still realistically achievable
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|
2025-12-13 16:46:18 +09:00 |
|
|
|
644e3c30d1
|
feat(Phase 2-1): Lane Classification + Fallback Reduction
## Phase 2-1: Lane Classification Box (Single Source of Truth)
### New Module: hak_lane_classify.inc.h
- Centralized size-to-lane mapping with unified boundary definitions
- Lane architecture:
- LANE_TINY: [0, 1024B] SuperSlab (unchanged)
- LANE_POOL: [1025, 52KB] Pool per-thread (extended!)
- LANE_ACE: [52KB, 2MB] ACE learning
- LANE_HUGE: [2MB+] mmap direct
- Key invariant: POOL_MIN = TINY_MAX + 1 (no gaps)
### Fixed: Tiny/Pool Boundary Mismatch
- Before: TINY_MAX_SIZE=1024 vs tiny_get_max_size()=2047 (inconsistent!)
- After: Both reference LANE_TINY_MAX=1024 (authoritative)
- Impact: Eliminates 1025-2047B "unmanaged zone" causing libc fragmentation
### Updated Files
- core/hakmem_tiny.h: Use LANE_TINY_MAX, fix sizes[7]=1024 (was 2047)
- core/hakmem_pool.h: Use POOL_MIN_REQUEST_SIZE=1025 (was 2048)
- core/box/hak_alloc_api.inc.h: Lane-based routing (HAK_LANE_IS_*)
## jemalloc Block Bug Fix
### Root Cause
- g_jemalloc_loaded initialized to -1 (unknown)
- Condition `if (block && g_jemalloc_loaded)` treated -1 as true
- Result: ALL allocations fallback to libc (even when jemalloc not loaded!)
### Fix
- Change condition to `g_jemalloc_loaded > 0`
- Only fallback when jemalloc is ACTUALLY loaded
- Applied to: malloc/free/calloc/realloc
### Impact
- Before: 100% libc fallback (jemalloc block false positive)
- After: Only genuine cases fallback (init_wait, lockdepth, etc.)
## Fallback Diagnostics (ChatGPT contribution)
### New Feature: HAKMEM_WRAP_DIAG
- ENV flag to enable fallback logging
- Reason-specific counters (init_wait, jemalloc_block, lockdepth, etc.)
- First 4 occurrences logged per reason
- Helps identify unwanted fallback paths
### Implementation
- core/box/wrapper_env_box.{c,h}: ENV cache + DIAG flag
- core/box/hak_wrappers.inc.h: wrapper_record_fallback() calls
## Verification
### Fallback Reduction
- Before fix: [wrap] libc malloc: jemalloc block (100% fallback)
- After fix: Only init_wait + lockdepth (expected, minimal)
### Known Issue
- Tiny allocator OOM (size=8) still crashes
- This is a pre-existing bug, unrelated to Phase 2-1
- Was hidden by jemalloc block false positive
- Will be investigated separately
## Performance Impact
### sh8bench 8 threads
- Phase 1-1: 15秒
- Phase 2-1: 14秒 (~7% improvement)
### Note
- True hakmem performance now measurable (no more 100% fallback)
- Tiny OOM prevents full benchmark completion
- Next: Fix Tiny allocator for complete evaluation
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: ChatGPT <chatgpt@openai.com>
|
2025-12-02 19:13:28 +09:00 |
|
|
|
49969d2e0f
|
feat(Phase 1-1): Complete getenv elimination from malloc/free hot paths (+39-42% perf)
## Summary
Eliminated all getenv() calls from malloc/free wrappers and allocator hot paths by implementing
constructor-based environment variable caching. This achieves 39-42% performance improvement
(36s → 22s on sh8bench single-thread).
## Performance Impact
- sh8bench 1 thread: 35-36s → 21-22s (+39-42% improvement) 🚀
- sh8bench 8 threads: ~15s (maintained)
- getenv overhead: 36.32% → 0% (completely eliminated)
## Changes
### New Files
- **core/box/tiny_env_box.{c,h}**: Centralized environment variable cache for Tiny allocator
- Caches 43 environment variables (HAKMEM_TINY_*, HAKMEM_SLL_*, HAKMEM_SS_*, etc.)
- Constructor-based initialization with atomic CAS for thread safety
- Inline accessor tiny_env_cfg() for hot path access
- **core/box/wrapper_env_box.{c,h}**: Environment cache for malloc/free wrappers
- Caches 3 wrapper variables (HAKMEM_STEP_TRACE, HAKMEM_LD_SAFE, HAKMEM_FREE_WRAP_TRACE)
- Constructor priority 101 ensures early initialization
- Replaces all lazy-init patterns in wrapper code
### Modified Files
- **Makefile**: Added tiny_env_box.o and wrapper_env_box.o to OBJS_BASE and SHARED_OBJS
- **core/box/hak_wrappers.inc.h**:
- Removed static lazy-init variables (g_step_trace, ld_safe_mode cache)
- Replaced with wrapper_env_cfg() lookups (wcfg->step_trace, wcfg->ld_safe_mode)
- All getenv() calls eliminated from malloc/free hot paths
- **core/hakmem.c**:
- Added hak_ld_env_init() with constructor for LD_PRELOAD caching
- Added hak_force_libc_ctor() for HAKMEM_FORCE_LIBC_ALLOC* caching
- Simplified hak_ld_env_mode() to return cached value only
- Simplified hak_force_libc_alloc() to use cached values
- Eliminated all getenv/atoi calls from hot paths
## Technical Details
### Constructor Initialization Pattern
All environment variables are now read once at library load time using __attribute__((constructor)):
```c
__attribute__((constructor(101)))
static void wrapper_env_ctor(void) {
wrapper_env_init_once(); // Atomic CAS ensures exactly-once init
}
```
### Thread Safety
- Atomic compare-and-swap (CAS) ensures single initialization
- Spin-wait for initialization completion in multi-threaded scenarios
- Memory barriers (memory_order_acq_rel) ensure visibility
### Hot Path Impact
Before: Every malloc/free → getenv("LD_PRELOAD") + getenv("HAKMEM_STEP_TRACE") + ...
After: Every malloc/free → Single pointer dereference (wcfg->field)
## Next Optimization Target (Phase 1-2)
Perf analysis reveals libc fallback accounts for ~51% of cycles:
- _int_malloc: 15.04%
- malloc: 9.81%
- _int_free: 10.07%
- malloc_consolidate: 9.27%
- unlink_chunk: 6.82%
Reducing libc fallback from 51% → 10% could yield additional +25-30% improvement.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: ChatGPT <chatgpt@openai.com>
|
2025-12-02 16:16:51 +09:00 |
|