7b7de53167
Phase FREE-FRONT-V3-1: Free route snapshot infrastructure + build fix
...
Summary:
========
Implemented Phase FREE-FRONT-V3 infrastructure to optimize free hotpath by:
1. Creating snapshot-based route decision table (consolidating route logic)
2. Removing redundant ENV checks from hot path
3. Preparing for future integration into hak_free_at()
Key Changes:
============
1. NEW FILES:
- core/box/free_front_v3_env_box.h: Route snapshot definition & API
- core/box/free_front_v3_env_box.c: Snapshot initialization & caching
2. Infrastructure Details:
- FreeRouteSnapshotV3: Maps class_idx → free_route_kind for all 8 classes
- Routes defined: LEGACY, TINY_V3, CORE_V6_C6, POOL_V1
- ENV-gated initialization (HAKMEM_TINY_FREE_FRONT_V3_ENABLED, default OFF)
- Per-thread TLS caching to avoid repeated ENV reads
3. Design Goals:
- Consolidate tiny_route_for_class() results into snapshot table
- Remove C7 ULTRA / v4 / v5 / v6 ENV checks from hot path
- Limit lookup (ss_fast_lookup/slab_index_for) to paths that truly need it
- Clear ownership boundary: front v3 handles routing, downstream handles free
4. Phase Plan:
- v3-1 ✅ COMPLETE: Infrastructure (snapshot table, ENV initialization, TLS cache)
- v3-2 (INFRASTRUCTURE ONLY): Placeholder integration in hak_free_api.inc.h
- v3-3 (FUTURE): Full integration + benchmark A/B to measure hotpath improvement
5. BUILD FIX:
- Added missing core/box/c7_meta_used_counter_box.o to OBJS_BASE in Makefile
- This symbol was referenced but not linked, causing undefined reference errors
- Benchmark targets now build cleanly without LTO
Status:
=======
- Build: ✅ PASS (bench_allocators_hakmem builds without errors)
- Integration: Currently DISABLED (default OFF, ready for v3-2 phase)
- No performance impact: Infrastructure-only, hotpath unchanged
Future Work:
============
- Phase v3-2: Integrate snapshot routing into hak_free_at() main path
- Phase v3-3: Measure free hotpath performance improvement (target: 1-2% less branch mispredict)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com >
2025-12-11 19:17:30 +09:00
e2ca52d59d
Phase v6-6: Inline hot path optimization for SmallObject Core v6
...
Optimize v6 alloc/free by eliminating redundant route checks and adding
inline hot path functions:
- smallobject_core_v6_box.h: Add inline hot path functions:
- small_alloc_c6_hot_v6() / small_alloc_c5_hot_v6(): Direct TLS pop
- small_free_c6_hot_v6() / small_free_c5_hot_v6(): Direct TLS push
- No route check needed (caller already validated via switch case)
- smallobject_core_v6.c: Add cold path functions:
- small_alloc_cold_v6(): Handle TLS refill from page
- small_free_cold_v6(): Handle page freelist push (TLS full/cross-thread)
- malloc_tiny_fast.h: Update front gate to use inline hot path:
- Alloc: hot path first, cold path fallback on TLS miss
- Free: hot path first, cold path fallback on TLS full
Performance results:
- C5-heavy: v6 ON 42.2M ≈ baseline (parity restored)
- C6-heavy: v6 ON 34.5M ≈ baseline (parity restored)
- Mixed 16-1024B: ~26.5M (v3-only: ~28.1M, gap is routing overhead)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2025-12-11 15:59:29 +09:00
c60199182e
Phase v6-1/2/3/4: SmallObject Core v6 - C6-only implementation + refactor
...
Phase v6-1: C6-only route stub (v1/pool fallback)
Phase v6-2: Segment v6 + ColdIface v6 + Core v6 HotPath implementation
- 2MiB segment / 64KiB page allocation
- O(1) ptr→page_meta lookup with segment masking
- C6-heavy A/B: SEGV-free but -44% performance (15.3M ops/s)
Phase v6-3: Thin-layer optimization (TLS ownership check + batch header + refill batching)
- TLS ownership fast-path skip page_meta for 90%+ of frees
- Batch header writes during refill (32 allocs = 1 header write)
- TLS batch refill (1/32 refill frequency)
- C6-heavy A/B: v6-2 15.3M → v6-3 27.1M ops/s (±0% vs baseline) ✅
Phase v6-4: Mixed hang fix (segment metadata lookup correction)
- Root cause: metadata lookup was reading mmap region instead of TLS slot
- Fix: use TLS slot descriptor with in_use validation
- Mixed health: 5M iterations SEGV-free, 35.8M ops/s ✅
Phase v6-refactor: Code quality improvements (macro unification + inline + docs)
- Add SMALL_V6_* prefix macros (header, pointer conversion, page index)
- Extract inline validation functions (small_page_v6_valid, small_ptr_in_segment_v6)
- Doxygen-style comments for all public functions
- Result: 0 compiler warnings, maintained +1.2% performance
Files:
- core/box/smallobject_core_v6_box.h (new, type & API definitions)
- core/box/smallobject_cold_iface_v6.h (new, cold iface API)
- core/box/smallsegment_v6_box.h (new, segment type definitions)
- core/smallobject_core_v6.c (new, C6 alloc/free implementation)
- core/smallobject_cold_iface_v6.c (new, refill/retire logic)
- core/smallsegment_v6.c (new, segment allocator)
- docs/analysis/SMALLOBJECT_CORE_V6_DESIGN.md (new, design document)
- core/box/tiny_route_env_box.h (modified, v6 route added)
- core/front/malloc_tiny_fast.h (modified, v6 case in route switch)
- Makefile (modified, v6 objects added)
- CURRENT_TASK.md (modified, v6 status added)
Status:
- C6-heavy: v6 OFF 27.1M → v6-3 ON 27.1M ops/s (±0%) ✅
- Mixed: v6 ON 35.8M ops/s (C6-only, other classes via v1) ✅
- Build: 0 warnings, fully documented ✅
🤖 Generated with Claude Code
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com >
2025-12-11 15:29:59 +09:00