492 lines
14 KiB
Markdown
492 lines
14 KiB
Markdown
|
|
# Warm Pool Architecture - Visual Summary & Decision Framework
|
|||
|
|
## 2025-12-04
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 The Core Problem
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Current Random Mixed Performance: 1.06M ops/s
|
|||
|
|
|
|||
|
|
What's happening on EVERY CACHE MISS (~5% of allocations):
|
|||
|
|
|
|||
|
|
malloc_tiny_fast(size)
|
|||
|
|
↓
|
|||
|
|
tiny_cold_refill_and_alloc() ← Called ~53,000 times per 1M allocs
|
|||
|
|
↓
|
|||
|
|
unified_cache_refill()
|
|||
|
|
↓
|
|||
|
|
Linear registry scan (O(N)) ← BOTTLENECK!
|
|||
|
|
├─ Search per-class registry
|
|||
|
|
├─ Check tier of each SuperSlab
|
|||
|
|
├─ Find first HOT one
|
|||
|
|
├─ Cost: 50-100 cycles per miss
|
|||
|
|
└─ Impact: ~5% of ops doing expensive work
|
|||
|
|
↓
|
|||
|
|
Carve ~64 blocks (fast)
|
|||
|
|
↓
|
|||
|
|
Return first block
|
|||
|
|
|
|||
|
|
Total cache miss cost: ~500-1000 cycles per miss
|
|||
|
|
Amortized: ~5-10 cycles per object
|
|||
|
|
Multiplied over 5% misses: SIGNIFICANT OVERHEAD
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 💡 The Warm Pool Solution
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
BEFORE (Current):
|
|||
|
|
Cache miss → Registry scan (O(N)) → Find HOT → Carve → Return
|
|||
|
|
|
|||
|
|
AFTER (Warm Pool):
|
|||
|
|
Cache miss → Warm pool pop (O(1)) → Already HOT → Carve → Return
|
|||
|
|
↑
|
|||
|
|
Pre-allocated SuperSlabs
|
|||
|
|
stored per-thread
|
|||
|
|
(TLS)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### The Warm Pool Concept
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Per-thread data structure:
|
|||
|
|
|
|||
|
|
g_tiny_warm_pool[TINY_NUM_CLASSES]: // For each size class
|
|||
|
|
.slabs[]: // Array of pre-allocated SuperSlabs
|
|||
|
|
.count: // How many are in pool
|
|||
|
|
.capacity: // Max capacity (typically 4)
|
|||
|
|
|
|||
|
|
For a 64-byte allocation (class 2):
|
|||
|
|
|
|||
|
|
If warm_pool[2].count > 0: ← FAST PATH
|
|||
|
|
Pop ss = warm_pool[2].slabs[--count]
|
|||
|
|
Carve blocks
|
|||
|
|
Return
|
|||
|
|
Cost: ~50 cycles per batch (5 per object)
|
|||
|
|
|
|||
|
|
Else: ← FALLBACK
|
|||
|
|
Registry scan (old path)
|
|||
|
|
Cost: ~500 cycles per batch
|
|||
|
|
(But RARE because pool is usually full)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📊 Performance Model
|
|||
|
|
|
|||
|
|
### Current (Registry Scan Every Miss)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Scenario: 1M allocations, 5% cache miss rate = 50,000 misses
|
|||
|
|
|
|||
|
|
Hot path (95%): 950,000 allocs × 25 cycles = 23.75M cycles
|
|||
|
|
Warm path (5%): 50,000 batches × 1000 cycles = 50M cycles
|
|||
|
|
Other overhead: ~15M cycles
|
|||
|
|
─────────────────────────────────────────────────
|
|||
|
|
Total: ~70.4M cycles
|
|||
|
|
~1.06M ops/s
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Proposed (Warm Pool, 90% Hit)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Scenario: 1M allocations, 5% cache miss rate
|
|||
|
|
|
|||
|
|
Hot path (95%): 950,000 allocs × 25 cycles = 23.75M cycles
|
|||
|
|
|
|||
|
|
Warm path (5%):
|
|||
|
|
├─ 90% warm pool hits: 45,000 batches × 100 cycles = 4.5M cycles
|
|||
|
|
└─ 10% registry falls: 5,000 batches × 1000 cycles = 5M cycles
|
|||
|
|
├─ Sub-total: 9.5M cycles (vs 50M before)
|
|||
|
|
|
|||
|
|
Other overhead: ~15M cycles
|
|||
|
|
─────────────────────────────────────────────────
|
|||
|
|
Total: ~48M cycles
|
|||
|
|
~1.46M ops/s (+38%)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### With Additional Optimizations (Lock-free, Batched Tier Checks)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Hot path (95%): 950,000 allocs × 25 cycles = 23.75M cycles
|
|||
|
|
Warm path (5%):
|
|||
|
|
├─ 95% warm pool hits: 47,500 batches × 75 cycles = 3.56M cycles
|
|||
|
|
└─ 5% registry falls: 2,500 batches × 800 cycles = 2M cycles
|
|||
|
|
├─ Sub-total: 5.56M cycles
|
|||
|
|
Other overhead: ~10M cycles
|
|||
|
|
─────────────────────────────────────────────────
|
|||
|
|
Total: ~39M cycles
|
|||
|
|
~1.79M ops/s (+69%)
|
|||
|
|
|
|||
|
|
Further optimizations (per-thread pools, batch pre-alloc):
|
|||
|
|
Potential ceiling: ~2.5-3.0M ops/s (+135-180%)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🔄 Warm Pool Data Flow
|
|||
|
|
|
|||
|
|
### Thread Startup
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Thread calls malloc() for first time:
|
|||
|
|
↓
|
|||
|
|
Check if warm_pool[class].capacity == 0:
|
|||
|
|
├─ YES → Initialize warm pools
|
|||
|
|
│ ├─ Set capacity = 4 per class
|
|||
|
|
│ ├─ Allocate array space (TLS, ~128KB total)
|
|||
|
|
│ ├─ Try to pre-populate from LRU cache
|
|||
|
|
│ │ ├─ Success: Get 2-3 SuperSlabs per class from LRU
|
|||
|
|
│ │ └─ Fail: Leave empty (will populate on cold alloc)
|
|||
|
|
│ └─ Ready!
|
|||
|
|
│
|
|||
|
|
└─ NO → Already initialized, continue
|
|||
|
|
|
|||
|
|
First allocation:
|
|||
|
|
├─ HOT: Unified cache hit → Return (99% of time)
|
|||
|
|
│
|
|||
|
|
└─ WARM (on cache miss):
|
|||
|
|
├─ warm_pool_pop(class) returns SuperSlab
|
|||
|
|
├─ If NULL (pool empty, rare):
|
|||
|
|
│ └─ Fall back to registry scan
|
|||
|
|
└─ Carve & return
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Steady State Execution
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
For each allocation:
|
|||
|
|
|
|||
|
|
malloc(size)
|
|||
|
|
├─ size → class_idx
|
|||
|
|
│
|
|||
|
|
├─ HOT: Unified cache hit (head != tail)?
|
|||
|
|
│ └─ YES (95%): Return immediately
|
|||
|
|
│
|
|||
|
|
└─ WARM: Unified cache miss (head == tail)?
|
|||
|
|
├─ Call unified_cache_refill(class_idx)
|
|||
|
|
│ ├─ SuperSlab ss = tiny_warm_pool_pop(class_idx)
|
|||
|
|
│ ├─ If ss != NULL (90% of misses):
|
|||
|
|
│ │ ├─ Carve ~64 blocks from ss
|
|||
|
|
│ │ ├─ Refill Unified Cache array
|
|||
|
|
│ │ └─ Return first block
|
|||
|
|
│ │
|
|||
|
|
│ └─ Else (10% of misses):
|
|||
|
|
│ ├─ Fall back to registry scan (COLD path)
|
|||
|
|
│ ├─ Find HOT SuperSlab in per-class registry
|
|||
|
|
│ ├─ Allocate new if not found (mmap)
|
|||
|
|
│ ├─ Carve blocks + refill warm pool
|
|||
|
|
│ └─ Return first block
|
|||
|
|
│
|
|||
|
|
└─ Return USER pointer
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Free Path Integration
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
free(ptr)
|
|||
|
|
├─ tiny_hot_free_fast()
|
|||
|
|
│ ├─ Push to TLS SLL (99% of time)
|
|||
|
|
│ └─ Return
|
|||
|
|
│
|
|||
|
|
└─ (On SLL full, triggered once per ~256 frees)
|
|||
|
|
├─ Batch drain SLL to SuperSlab freelist
|
|||
|
|
├─ When SuperSlab becomes empty:
|
|||
|
|
│ ├─ Remove from refill registry
|
|||
|
|
│ ├─ Push to LRU cache (NOT warm pool)
|
|||
|
|
│ │ (LRU will eventually evict or reuse)
|
|||
|
|
│ └─ When LRU reuses: add to warm pool
|
|||
|
|
│
|
|||
|
|
└─ Return
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Warm Pool Replenishment (Background)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
When warm_pool[class].count drops below threshold (1):
|
|||
|
|
├─ Called from cold allocation path (rare)
|
|||
|
|
│
|
|||
|
|
├─ For next 2-3 SuperSlabs in registry:
|
|||
|
|
│ ├─ Check if tier is still HOT
|
|||
|
|
│ ├─ Add to warm pool (up to capacity)
|
|||
|
|
│ └─ Continue registry scan
|
|||
|
|
│
|
|||
|
|
└─ Restore warm pool for next miss
|
|||
|
|
|
|||
|
|
No explicit background thread needed!
|
|||
|
|
Warm pool is refilled as side effect of cold allocs.
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## ⚡ Implementation Complexity vs Gain
|
|||
|
|
|
|||
|
|
### Low Complexity (Recommended)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Effort: 200-300 lines of code
|
|||
|
|
Time: 2-3 developer-days
|
|||
|
|
Risk: Low
|
|||
|
|
|
|||
|
|
Changes:
|
|||
|
|
1. Create tiny_warm_pool.h header (~50 lines)
|
|||
|
|
2. Declare __thread warm pools (~10 lines)
|
|||
|
|
3. Modify unified_cache_refill() (~100 lines)
|
|||
|
|
- Try warm_pool_pop() first
|
|||
|
|
- On success: carve & return
|
|||
|
|
- On fail: registry scan (existing code path)
|
|||
|
|
4. Add initialization in malloc (~20 lines)
|
|||
|
|
5. Add cleanup on thread exit (~10 lines)
|
|||
|
|
|
|||
|
|
Expected gain: +40-50% (1.06M → 1.5M ops/s)
|
|||
|
|
Risk: Very low (warm pool is additive, fallback to registry always works)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Medium Complexity (Phase 2)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Effort: 500-700 lines of code
|
|||
|
|
Time: 5-7 developer-days
|
|||
|
|
Risk: Medium
|
|||
|
|
|
|||
|
|
Changes:
|
|||
|
|
1. Lock-free warm pool using CAS
|
|||
|
|
2. Batched tier transition checks
|
|||
|
|
3. Per-thread allocation pool
|
|||
|
|
4. Background warm pool refill thread
|
|||
|
|
|
|||
|
|
Expected gain: +70-100% (1.06M → 1.8-2.1M ops/s)
|
|||
|
|
Risk: Medium (requires careful synchronization)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### High Complexity (Phase 3)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Effort: 1000+ lines
|
|||
|
|
Time: 2-3 weeks
|
|||
|
|
Risk: High
|
|||
|
|
|
|||
|
|
Changes:
|
|||
|
|
1. Comprehensive redesign with three separate pools per thread
|
|||
|
|
2. Lock-free fast path for entire allocation
|
|||
|
|
3. Per-size-class threads for refill
|
|||
|
|
4. Complex tier management
|
|||
|
|
|
|||
|
|
Expected gain: +150-200% (1.06M → 2.5-3.2M ops/s)
|
|||
|
|
Risk: High (major architectural changes, potential correctness issues)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎓 Why 10x is Hard (But 2x is Feasible)
|
|||
|
|
|
|||
|
|
### The 80x Gap: Random Mixed vs Tiny Hot
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Tiny Hot: 89M ops/s
|
|||
|
|
├─ Single fixed size (16 bytes)
|
|||
|
|
├─ L1 cache perfect hit
|
|||
|
|
├─ No pool lookup
|
|||
|
|
├─ No routing
|
|||
|
|
├─ No page faults
|
|||
|
|
└─ Ideal case
|
|||
|
|
|
|||
|
|
Random Mixed: 1.06M ops/s
|
|||
|
|
├─ 256 different sizes
|
|||
|
|
├─ L1 cache misses
|
|||
|
|
├─ Pool routing needed
|
|||
|
|
├─ Registry lookup on miss
|
|||
|
|
├─ ~7,600 page faults
|
|||
|
|
└─ Real-world case
|
|||
|
|
|
|||
|
|
Difference: 83x
|
|||
|
|
|
|||
|
|
Can we close this gap?
|
|||
|
|
- Warm pool optimization: +40-50% (to 1.5-1.6M)
|
|||
|
|
- Lock-free pools: +20-30% (to 1.8-2.0M)
|
|||
|
|
- Per-thread pools: +10-15% (to 2.0-2.3M)
|
|||
|
|
- Other tuning: +5-10% (to 2.1-2.5M)
|
|||
|
|
──────────────────────────────────
|
|||
|
|
Total realistic: 2.0-2.5x (still 35-40x below Tiny Hot)
|
|||
|
|
|
|||
|
|
Why not 10x?
|
|||
|
|
1. Fundamental overhead: 256 size classes (not 1)
|
|||
|
|
2. Working set: Pages faults (7,600) are unavoidable
|
|||
|
|
3. Routing: Pool lookup adds cycles (can't eliminate)
|
|||
|
|
4. Tier management: Utilization tracking costs (necessary for correctness)
|
|||
|
|
5. Memory: 2MB SuperSlab fragmentation (not tunable)
|
|||
|
|
|
|||
|
|
The 10x gap is ARCHITECTURAL, not a bug.
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📋 Implementation Phases
|
|||
|
|
|
|||
|
|
### ✅ Phase 1: Basic Warm Pool (THIS PROPOSAL)
|
|||
|
|
- **Goal:** +40-50% improvement (1.06M → 1.5M ops/s)
|
|||
|
|
- **Scope:** Warm pool data structure + unified_cache_refill() integration
|
|||
|
|
- **Risk:** Low
|
|||
|
|
- **Timeline:** 2-3 days
|
|||
|
|
- **Recommended:** YES (high ROI)
|
|||
|
|
|
|||
|
|
### ⏳ Phase 2: Advanced Optimizations (Optional)
|
|||
|
|
- **Goal:** +20-30% additional (1.5M → 1.8-2.0M ops/s)
|
|||
|
|
- **Scope:** Lock-free pools, batched tier checks, per-thread refill
|
|||
|
|
- **Risk:** Medium
|
|||
|
|
- **Timeline:** 1-2 weeks
|
|||
|
|
- **Recommended:** Maybe (depends on user requirements)
|
|||
|
|
|
|||
|
|
### ❌ Phase 3: Architectural Redesign (Not Recommended)
|
|||
|
|
- **Goal:** +100%+ improvement (2.0M+ ops/s)
|
|||
|
|
- **Scope:** Major rewrite of allocation path
|
|||
|
|
- **Risk:** High
|
|||
|
|
- **Timeline:** 3-4 weeks
|
|||
|
|
- **Recommended:** No (diminishing returns, high risk)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🔐 Safety & Correctness
|
|||
|
|
|
|||
|
|
### Thread Safety
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Warm pool is thread-local (__thread):
|
|||
|
|
✓ No locks needed
|
|||
|
|
✓ No atomic operations
|
|||
|
|
✓ No synchronization required
|
|||
|
|
✓ Safe for all threads
|
|||
|
|
|
|||
|
|
Fallback path:
|
|||
|
|
✓ Registry scan (existing code, proven)
|
|||
|
|
✓ Always works if warm pool empty
|
|||
|
|
✓ Correctness guaranteed
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Memory Safety
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
SuperSlab ownership:
|
|||
|
|
✓ Warm pool only holds SuperSlabs we own
|
|||
|
|
✓ Tier/Guard checks catch invalid cases
|
|||
|
|
✓ On tier change (HOT→DRAINING): removed from pool
|
|||
|
|
✓ Validation on periodic tier checks (batched)
|
|||
|
|
|
|||
|
|
Object layout:
|
|||
|
|
✓ No change to object headers
|
|||
|
|
✓ No change to allocation metadata
|
|||
|
|
✓ Warm pool is transparent to user code
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Tier Transitions
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
If SuperSlab changes tier (HOT → DRAINING):
|
|||
|
|
├─ Current: Caught on next registry scan
|
|||
|
|
├─ Proposed: Caught on next batch tier check
|
|||
|
|
├─ Rare case (only if working set shrinks)
|
|||
|
|
└─ Fallback: Registry scan still works
|
|||
|
|
|
|||
|
|
Validation strategy:
|
|||
|
|
├─ Periodic (batched) tier validation
|
|||
|
|
├─ On cold path (always validates)
|
|||
|
|
├─ Accept small window of stale data
|
|||
|
|
└─ Correctness preserved
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 📊 Success Metrics
|
|||
|
|
|
|||
|
|
### Warm Pool Metrics to Track
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
While running Random Mixed benchmark:
|
|||
|
|
|
|||
|
|
Per-thread warm pool statistics:
|
|||
|
|
├─ Pool capacity: 4 per class (128 total for 32 classes)
|
|||
|
|
├─ Pool hit rate: 85-95% (target: > 90%)
|
|||
|
|
├─ Pool miss rate: 5-15% (fallback to registry)
|
|||
|
|
└─ Pool push rate: On cold alloc (should be rare)
|
|||
|
|
|
|||
|
|
Cache refill metrics:
|
|||
|
|
├─ Warm pool refills: ~50,000 (90% of misses)
|
|||
|
|
├─ Registry fallbacks: ~5,000 (10% of misses)
|
|||
|
|
└─ Cold allocations: 10-100 (very rare)
|
|||
|
|
|
|||
|
|
Performance metrics:
|
|||
|
|
├─ Total ops/s: 1.5M+ (target: +40% from 1.06M)
|
|||
|
|
├─ Ops per cycle: 0.05+ (from 0.015 baseline)
|
|||
|
|
└─ Cache miss overhead: Reduced by 80%+
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Regression Tests
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Ensure no degradation:
|
|||
|
|
✓ Tiny Hot: 89M ops/s (unchanged)
|
|||
|
|
✓ Tiny Cold: No regression expected
|
|||
|
|
✓ Tiny Middle: No regression expected
|
|||
|
|
✓ Memory correctness: All tests pass
|
|||
|
|
✓ Multithreaded: No race conditions
|
|||
|
|
✓ Thread safety: Concurrent access safe
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🚀 Recommended Next Steps
|
|||
|
|
|
|||
|
|
### Step 1: Agree on Scope
|
|||
|
|
- [ ] Accept Phase 1 (warm pool) proposal
|
|||
|
|
- [ ] Defer Phase 2 (advanced optimizations) to later
|
|||
|
|
- [ ] Do not attempt Phase 3 (architectural rewrite)
|
|||
|
|
|
|||
|
|
### Step 2: Create Warm Pool Implementation
|
|||
|
|
- [ ] Create `core/front/tiny_warm_pool.h`
|
|||
|
|
- [ ] Implement data structures and operations
|
|||
|
|
- [ ] Write inline functions for hot operations
|
|||
|
|
|
|||
|
|
### Step 3: Integrate with Unified Cache
|
|||
|
|
- [ ] Modify `unified_cache_refill()` to use warm pool
|
|||
|
|
- [ ] Add initialization logic
|
|||
|
|
- [ ] Test correctness
|
|||
|
|
|
|||
|
|
### Step 4: Benchmark & Validate
|
|||
|
|
- [ ] Run Random Mixed benchmark
|
|||
|
|
- [ ] Measure ops/s improvement (target: 1.5M+)
|
|||
|
|
- [ ] Profile warm pool hit rate (target: > 90%)
|
|||
|
|
- [ ] Verify no regression in other workloads
|
|||
|
|
|
|||
|
|
### Step 5: Iterate & Refine
|
|||
|
|
- [ ] If hit rate < 80%: Increase warm pool size
|
|||
|
|
- [ ] If hit rate > 95%: Reduce warm pool size (save memory)
|
|||
|
|
- [ ] If performance < 1.4M ops/s: Review bottlenecks
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 🎯 Conclusion
|
|||
|
|
|
|||
|
|
**Warm pool implementation offers:**
|
|||
|
|
- High ROI (40-50% improvement with 200-300 lines of code)
|
|||
|
|
- Low risk (fallback to proven registry scan path)
|
|||
|
|
- Incremental approach (doesn't require full redesign)
|
|||
|
|
- Clear success criteria (ops/s improvement, hit rate tracking)
|
|||
|
|
|
|||
|
|
**Expected outcome:**
|
|||
|
|
- Random Mixed: 1.06M → 1.5M+ ops/s (+40%)
|
|||
|
|
- Tiny Hot: 89M ops/s (unchanged)
|
|||
|
|
- Total system: Better performance for real-world workloads
|
|||
|
|
|
|||
|
|
**Path to further improvements:**
|
|||
|
|
- Phase 2 (advanced): +20-30% more (1.8-2.0M ops/s)
|
|||
|
|
- Phase 3 (redesign): Not recommended (high effort, limited gain)
|
|||
|
|
|
|||
|
|
**Recommendation:** Implement Phase 1 warm pool. Re-evaluate after measuring actual performance.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**Document Status:** Ready for implementation
|
|||
|
|
**Review & Approval:** Required before starting code changes
|