docs: Start Phase 5 - Mid/Large Allocation Optimization
Update CURRENT_TASK.md with Phase 5 roadmap: - Goal: +10-26% improvement (57.2M → 63-72M ops/s) - Strategy: Fix allocation gap + Config Box + Mid MT optimization - Duration: 12 days / 2 weeks Phase 5 Steps: 1. Mid MT Verification (2 days) 2. Allocation Gap Elimination (3 days) - Priority 1 3. Mid/Large Config Box (3 days) 4. Mid Registry Pre-allocation (2 days) 5. Documentation & Benchmark (2 days) Critical Issue Found: - 1KB-8KB allocations fall through to mmap() when ACE disabled - Impact: 1000-5000x slower than O(1) allocation - Fix: Route through existing Mid MT allocator Phase 4 Complete: - Result: 53.3M → 57.2M ops/s (+7.3%) - PGO deferred to final optimization phase 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
129
CURRENT_TASK.md
129
CURRENT_TASK.md
@ -1,8 +1,135 @@
|
||||
# Current Task: Phase 4 - Tiny Front Optimization
|
||||
# Current Task: Phase 5 - Mid/Large Allocation Optimization
|
||||
|
||||
**Date**: 2025-11-29
|
||||
**Goal**: Mid/Large allocation gap elimination + Config Box application
|
||||
**Strategy**: Fix allocation gap (1KB-8KB) + Compile-time config + Mid MT optimization
|
||||
**Expected Gain**: +10-26% (57.2M → 63-72M ops/s)
|
||||
|
||||
---
|
||||
|
||||
## Phase 5 Overview: 5-Step Approach
|
||||
|
||||
### Step 1: Mid MT Verification (Pending)
|
||||
- **Duration**: 2 days
|
||||
- **Risk**: Low
|
||||
- **Goal**: Verify Mid MT allocator handles 1KB-8KB range efficiently
|
||||
|
||||
**Deliverables**:
|
||||
1. Benchmark Mid MT performance for 1KB-8KB sizes
|
||||
2. Identify any gaps or inefficiencies
|
||||
3. Document current Mid MT behavior
|
||||
|
||||
---
|
||||
|
||||
### Step 2: Allocation Gap Elimination (Pending)
|
||||
- **Duration**: 3 days
|
||||
- **Risk**: Medium
|
||||
- **Target**: +5-15% improvement
|
||||
- **Goal**: Route 1KB-8KB allocations through Mid MT instead of mmap fallback
|
||||
|
||||
**Critical Issue**:
|
||||
- **File**: `core/box/hak_alloc_api.inc.h:171-216`
|
||||
- **Problem**: When ACE disabled, 1KB-8KB falls through to mmap()
|
||||
- **Impact**: 1000-5000x slower than O(1) allocation
|
||||
|
||||
**Deliverables**:
|
||||
1. Fix routing logic in `hak_alloc_api.inc.h`
|
||||
2. Route all >1KB allocations through Mid MT
|
||||
3. Benchmark improvement
|
||||
4. Completion report
|
||||
|
||||
---
|
||||
|
||||
### Step 3: Mid/Large Config Box (Pending)
|
||||
- **Duration**: 3 days
|
||||
- **Risk**: Low
|
||||
- **Target**: +2-4% improvement
|
||||
- **Goal**: Apply Phase 4 Config Box pattern to Mid/Large feature gates
|
||||
|
||||
**Runtime ENV Checks to Eliminate**:
|
||||
- `HAKMEM_SMALLMID_ENABLE` (SmallMid allocator gate)
|
||||
- `HAKMEM_POOL_TLS` (Pool allocator gate)
|
||||
- `HAKMEM_BIGCACHE` (BigCache gate)
|
||||
- `HAKMEM_ACE` (ACE allocator gate)
|
||||
- 4+ other feature checks in hot path
|
||||
|
||||
**Deliverables**:
|
||||
1. `core/box/mid_large_config_box.h` - Reuse Phase 4 pattern
|
||||
2. Replace 5-8 runtime checks with compile-time macros
|
||||
3. Build flag: `HAKMEM_MID_LARGE_PGO=1`
|
||||
4. Benchmark improvement
|
||||
5. Completion report
|
||||
|
||||
---
|
||||
|
||||
### Step 4: Mid Registry Pre-allocation (Pending)
|
||||
- **Duration**: 2 days
|
||||
- **Risk**: Low
|
||||
- **Target**: Eliminate lock contention in MT workloads
|
||||
- **Goal**: Pre-allocate Mid MT registry at init instead of lazy allocation
|
||||
|
||||
**Deliverables**:
|
||||
1. Modify `hakmem_mid_mt.c` init to pre-allocate registry
|
||||
2. Remove registry lock from hot path
|
||||
3. Benchmark MT workload improvement
|
||||
4. Completion report
|
||||
|
||||
---
|
||||
|
||||
### Step 5: Documentation & Final Benchmark (Pending)
|
||||
- **Duration**: 2 days
|
||||
- **Risk**: Low
|
||||
- **Goal**: Document Phase 5 results, prepare for Phase 6
|
||||
|
||||
**Deliverables**:
|
||||
1. Phase 5 completion report
|
||||
2. Full benchmark suite comparison
|
||||
3. Update CURRENT_TASK.md for Phase 6
|
||||
4. Git commit & documentation
|
||||
|
||||
---
|
||||
|
||||
## Phase 5 Success Criteria
|
||||
|
||||
**bench_random_mixed (ws=256)**:
|
||||
- Phase 4 result: 57.2M ops/s (Hot/Cold Box, no PGO)
|
||||
- Phase 5.1 (Gap fix): 60-65M ops/s (+5-15%)
|
||||
- Phase 5.2 (Config Box): 62-68M ops/s (+2-4% cumulative)
|
||||
- Phase 5.3 (Registry): 63-70M ops/s (MT improvement)
|
||||
- **Phase 5 target**: **63-72M ops/s** ✓ (+10-26% cumulative)
|
||||
|
||||
**Allocation Gap Impact**:
|
||||
- 1KB-8KB allocations: mmap() → Mid MT (1000-5000x faster)
|
||||
|
||||
---
|
||||
|
||||
## Current Status: Phase 5 Ready to Start
|
||||
|
||||
**Phase 4 Complete** ✅:
|
||||
- Step 1: PGO Workflow Box (+6.25%)
|
||||
- Step 2: Hot/Cold Path Box (+7.3%)
|
||||
- Step 3: Front Config Box (+2.7-4.9%)
|
||||
- **Result**: 53.3M → 57.2M ops/s (+7.3%, without PGO)
|
||||
|
||||
**Phase 5 Next Actions**:
|
||||
1. **Step 1**: Verify Mid MT for 1KB range (2 days)
|
||||
2. **Step 2**: Eliminate allocation gap (3 days)
|
||||
3. **Step 3**: Apply Config Box pattern (3 days)
|
||||
4. **Step 4**: Pre-allocate Mid registry (2 days)
|
||||
5. **Step 5**: Documentation & benchmarks (2 days)
|
||||
|
||||
**Total Duration**: 12 days / 2 weeks
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
# Previous: Phase 4 - Tiny Front Optimization ✅ COMPLETE
|
||||
|
||||
**Date**: 2025-11-29
|
||||
**Goal**: Tiny allocation throughput 2x improvement (56.8M → 110M+ ops/s)
|
||||
**Strategy**: Box化 + PGO + Hot/Cold separation
|
||||
**Result**: 53.3M → 57.2M ops/s (+7.3%, without PGO)
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user