diff --git a/CURRENT_TASK.md b/CURRENT_TASK.md index 980aac6d..58617b24 100644 --- a/CURRENT_TASK.md +++ b/CURRENT_TASK.md @@ -1,8 +1,135 @@ -# Current Task: Phase 4 - Tiny Front Optimization +# Current Task: Phase 5 - Mid/Large Allocation Optimization + +**Date**: 2025-11-29 +**Goal**: Mid/Large allocation gap elimination + Config Box application +**Strategy**: Fix allocation gap (1KB-8KB) + Compile-time config + Mid MT optimization +**Expected Gain**: +10-26% (57.2M → 63-72M ops/s) + +--- + +## Phase 5 Overview: 5-Step Approach + +### Step 1: Mid MT Verification (Pending) +- **Duration**: 2 days +- **Risk**: Low +- **Goal**: Verify Mid MT allocator handles 1KB-8KB range efficiently + +**Deliverables**: +1. Benchmark Mid MT performance for 1KB-8KB sizes +2. Identify any gaps or inefficiencies +3. Document current Mid MT behavior + +--- + +### Step 2: Allocation Gap Elimination (Pending) +- **Duration**: 3 days +- **Risk**: Medium +- **Target**: +5-15% improvement +- **Goal**: Route 1KB-8KB allocations through Mid MT instead of mmap fallback + +**Critical Issue**: +- **File**: `core/box/hak_alloc_api.inc.h:171-216` +- **Problem**: When ACE disabled, 1KB-8KB falls through to mmap() +- **Impact**: 1000-5000x slower than O(1) allocation + +**Deliverables**: +1. Fix routing logic in `hak_alloc_api.inc.h` +2. Route all >1KB allocations through Mid MT +3. Benchmark improvement +4. Completion report + +--- + +### Step 3: Mid/Large Config Box (Pending) +- **Duration**: 3 days +- **Risk**: Low +- **Target**: +2-4% improvement +- **Goal**: Apply Phase 4 Config Box pattern to Mid/Large feature gates + +**Runtime ENV Checks to Eliminate**: +- `HAKMEM_SMALLMID_ENABLE` (SmallMid allocator gate) +- `HAKMEM_POOL_TLS` (Pool allocator gate) +- `HAKMEM_BIGCACHE` (BigCache gate) +- `HAKMEM_ACE` (ACE allocator gate) +- 4+ other feature checks in hot path + +**Deliverables**: +1. `core/box/mid_large_config_box.h` - Reuse Phase 4 pattern +2. Replace 5-8 runtime checks with compile-time macros +3. Build flag: `HAKMEM_MID_LARGE_PGO=1` +4. Benchmark improvement +5. Completion report + +--- + +### Step 4: Mid Registry Pre-allocation (Pending) +- **Duration**: 2 days +- **Risk**: Low +- **Target**: Eliminate lock contention in MT workloads +- **Goal**: Pre-allocate Mid MT registry at init instead of lazy allocation + +**Deliverables**: +1. Modify `hakmem_mid_mt.c` init to pre-allocate registry +2. Remove registry lock from hot path +3. Benchmark MT workload improvement +4. Completion report + +--- + +### Step 5: Documentation & Final Benchmark (Pending) +- **Duration**: 2 days +- **Risk**: Low +- **Goal**: Document Phase 5 results, prepare for Phase 6 + +**Deliverables**: +1. Phase 5 completion report +2. Full benchmark suite comparison +3. Update CURRENT_TASK.md for Phase 6 +4. Git commit & documentation + +--- + +## Phase 5 Success Criteria + +**bench_random_mixed (ws=256)**: +- Phase 4 result: 57.2M ops/s (Hot/Cold Box, no PGO) +- Phase 5.1 (Gap fix): 60-65M ops/s (+5-15%) +- Phase 5.2 (Config Box): 62-68M ops/s (+2-4% cumulative) +- Phase 5.3 (Registry): 63-70M ops/s (MT improvement) +- **Phase 5 target**: **63-72M ops/s** ✓ (+10-26% cumulative) + +**Allocation Gap Impact**: +- 1KB-8KB allocations: mmap() → Mid MT (1000-5000x faster) + +--- + +## Current Status: Phase 5 Ready to Start + +**Phase 4 Complete** ✅: +- Step 1: PGO Workflow Box (+6.25%) +- Step 2: Hot/Cold Path Box (+7.3%) +- Step 3: Front Config Box (+2.7-4.9%) +- **Result**: 53.3M → 57.2M ops/s (+7.3%, without PGO) + +**Phase 5 Next Actions**: +1. **Step 1**: Verify Mid MT for 1KB range (2 days) +2. **Step 2**: Eliminate allocation gap (3 days) +3. **Step 3**: Apply Config Box pattern (3 days) +4. **Step 4**: Pre-allocate Mid registry (2 days) +5. **Step 5**: Documentation & benchmarks (2 days) + +**Total Duration**: 12 days / 2 weeks + +--- + +--- + +# Previous: Phase 4 - Tiny Front Optimization ✅ COMPLETE **Date**: 2025-11-29 **Goal**: Tiny allocation throughput 2x improvement (56.8M → 110M+ ops/s) **Strategy**: Box化 + PGO + Hot/Cold separation +**Result**: 53.3M → 57.2M ops/s (+7.3%, without PGO) ---