192 lines
6.8 KiB
Markdown
192 lines
6.8 KiB
Markdown
|
|
# Phase 60: Alloc Pass-Down SSOT - A/B Test Results
|
||
|
|
|
||
|
|
**Date**: 2025-12-17
|
||
|
|
**Verdict**: **NO-GO** (-0.46%)
|
||
|
|
|
||
|
|
## Executive Summary
|
||
|
|
|
||
|
|
Phase 60 attempted to reduce redundant computations in the allocation path by computing ENV snapshot, route kind, C7 ULTRA, and DUALHOT flags once at the entry point and passing them down to the allocation logic (SSOT pattern, similar to Free-side Phase 19-6C).
|
||
|
|
|
||
|
|
**Result**: The SSOT approach introduced a slight performance regression (-0.46%), making it a NO-GO. The added branch check `if (alloc_passdown_ssot_enabled())` and the overhead of computing the context upfront outweighed any benefits from reducing duplicate calculations.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Step 0: Runtime Profiling (Prerequisite Check)
|
||
|
|
|
||
|
|
**Command**:
|
||
|
|
```bash
|
||
|
|
perf record -F 99 -g -- ./bench_random_mixed_hakmem_minimal 200000000 400 1
|
||
|
|
perf report --no-children | head -60
|
||
|
|
```
|
||
|
|
|
||
|
|
**Top 50 Functions** (overhead %):
|
||
|
|
```
|
||
|
|
31.27% malloc
|
||
|
|
28.60% free
|
||
|
|
21.82% main
|
||
|
|
4.14% tiny_c7_ultra_alloc.constprop.0
|
||
|
|
3.69% free_tiny_fast_compute_route_and_heap.lto_priv.0
|
||
|
|
3.50% tiny_region_id_write_header.lto_priv.0
|
||
|
|
2.16% tiny_c7_ultra_free
|
||
|
|
1.21% unified_cache_push.lto_priv.0
|
||
|
|
1.00% hak_free_at.part.0
|
||
|
|
0.47% hak_force_libc_alloc.lto_priv.0
|
||
|
|
0.46% hak_super_lookup.part.0.lto_priv.4.lto_priv.0
|
||
|
|
0.45% hak_pool_try_alloc_v1_impl.part.0
|
||
|
|
```
|
||
|
|
|
||
|
|
**Conclusion**: Alloc-side functions (`malloc`, `tiny_c7_ultra_alloc`, `tiny_region_id_write_header`) are present in the Top 50, confirming that this Phase is worth investigating.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## A/B Test Results (Mixed Benchmark, 10-run)
|
||
|
|
|
||
|
|
### Baseline (HAKMEM_ALLOC_PASSDOWN_SSOT=0)
|
||
|
|
|
||
|
|
**Command**:
|
||
|
|
```bash
|
||
|
|
make bench_random_mixed_hakmem_minimal
|
||
|
|
BENCH_BIN=./bench_random_mixed_hakmem_minimal scripts/run_mixed_10_cleanenv.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
**Results**:
|
||
|
|
```
|
||
|
|
Run 1: 60411170 ops/s
|
||
|
|
Run 2: 59748852 ops/s
|
||
|
|
Run 3: 59978565 ops/s
|
||
|
|
Run 4: 60709007 ops/s
|
||
|
|
Run 5: 60525102 ops/s
|
||
|
|
Run 6: 60140203 ops/s
|
||
|
|
Run 7: 58531001 ops/s
|
||
|
|
Run 8: 59976257 ops/s
|
||
|
|
Run 9: 59847921 ops/s
|
||
|
|
Run 10: 60617511 ops/s
|
||
|
|
```
|
||
|
|
|
||
|
|
**Statistics**:
|
||
|
|
- **Mean**: 60,048,559 ops/s
|
||
|
|
- **Min**: 58,531,001 ops/s
|
||
|
|
- **Max**: 60,709,007 ops/s
|
||
|
|
- **StdDev**: 597,500 ops/s
|
||
|
|
- **CV**: 1.00%
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Treatment (HAKMEM_ALLOC_PASSDOWN_SSOT=1)
|
||
|
|
|
||
|
|
**Command**:
|
||
|
|
```bash
|
||
|
|
make clean
|
||
|
|
make bench_random_mixed_hakmem_minimal EXTRA_CFLAGS='-DHAKMEM_ALLOC_PASSDOWN_SSOT=1'
|
||
|
|
BENCH_BIN=./bench_random_mixed_hakmem_minimal scripts/run_mixed_10_cleanenv.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
**Results**:
|
||
|
|
```
|
||
|
|
Run 1: 60961455 ops/s
|
||
|
|
Run 2: 60006558 ops/s
|
||
|
|
Run 3: 59090044 ops/s
|
||
|
|
Run 4: 60244712 ops/s
|
||
|
|
Run 5: 60909895 ops/s
|
||
|
|
Run 6: 60470043 ops/s
|
||
|
|
Run 7: 59077611 ops/s
|
||
|
|
Run 8: 58890407 ops/s
|
||
|
|
Run 9: 60107925 ops/s
|
||
|
|
Run 10: 57966046 ops/s
|
||
|
|
```
|
||
|
|
|
||
|
|
**Statistics**:
|
||
|
|
- **Mean**: 59,772,470 ops/s
|
||
|
|
- **Min**: 57,966,046 ops/s
|
||
|
|
- **Max**: 60,961,455 ops/s
|
||
|
|
- **StdDev**: 925,965 ops/s
|
||
|
|
- **CV**: 1.55%
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Comparison
|
||
|
|
|
||
|
|
| Metric | Baseline (SSOT=0) | Treatment (SSOT=1) | Delta |
|
||
|
|
|-----------------|-------------------|--------------------|--------------|
|
||
|
|
| **Mean** | 60,048,559 ops/s | 59,772,470 ops/s | **-0.46%** |
|
||
|
|
| **CV** | 1.00% | 1.55% | +0.55pp |
|
||
|
|
| **Min** | 58,531,001 ops/s | 57,966,046 ops/s | -0.97% |
|
||
|
|
| **Max** | 60,709,007 ops/s | 60,961,455 ops/s | +0.42% |
|
||
|
|
|
||
|
|
**Verdict**: **NO-GO** (regression of -0.46%, below the -1.0% threshold but still negative)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Root Cause Analysis
|
||
|
|
|
||
|
|
### 1. Added Branch Overhead
|
||
|
|
|
||
|
|
The SSOT approach requires a branch check at the entry point:
|
||
|
|
```c
|
||
|
|
if (alloc_passdown_ssot_enabled()) {
|
||
|
|
alloc_passdown_context_t ctx = alloc_passdown_context_compute(class_idx);
|
||
|
|
return malloc_tiny_fast_for_class_ssot(size, class_idx, &ctx);
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
Even though `alloc_passdown_ssot_enabled()` is compile-time constant in `HAKMEM_BENCH_MINIMAL`, the branch itself adds overhead.
|
||
|
|
|
||
|
|
### 2. Duplicate Context Computation
|
||
|
|
|
||
|
|
The `alloc_passdown_context_compute()` function computes:
|
||
|
|
- ENV snapshot (`hakmem_env_snapshot()`)
|
||
|
|
- C7 ULTRA enabled (`tiny_c7_ultra_enabled_env()`)
|
||
|
|
- DUALHOT enabled (`alloc_dualhot_enabled()`)
|
||
|
|
- Route kind (`tiny_static_route_get_kind_fast()` or `tiny_policy_hot_get_route_with_env()`)
|
||
|
|
|
||
|
|
However, the **original path already computes these values on-demand**, and only when needed. The SSOT path computes them **upfront**, even if they are not used (e.g., if C7 ULTRA hits early, the route kind is not needed).
|
||
|
|
|
||
|
|
### 3. Pass-Down Overhead
|
||
|
|
|
||
|
|
The `alloc_passdown_context_t` struct is passed by pointer to `malloc_tiny_fast_for_class_ssot()`, which may introduce ABI overhead (register pressure, stack spills).
|
||
|
|
|
||
|
|
### 4. No Actual Redundancy Reduction
|
||
|
|
|
||
|
|
The original path has **early exits** (C7 ULTRA, DUALHOT) that avoid expensive computations in the common case. The SSOT path **forces** all computations upfront, negating the benefit of early exits.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Comparison with Free-Side Phase 19-6C
|
||
|
|
|
||
|
|
**Free-Side Success** (Phase 19-6C):
|
||
|
|
- Free-side had **many branches** and **duplicate policy snapshot calls** across multiple code paths.
|
||
|
|
- Pass-down eliminated these redundancies, resulting in a **+1.5% improvement**.
|
||
|
|
|
||
|
|
**Alloc-Side Failure** (Phase 60):
|
||
|
|
- Alloc-side already has **early exits** (C7 ULTRA, DUALHOT) that avoid expensive computations.
|
||
|
|
- The SSOT approach **forces upfront computation**, reducing the benefit of early exits.
|
||
|
|
- The added branch check (`if (alloc_passdown_ssot_enabled())`) introduces overhead.
|
||
|
|
|
||
|
|
**Conclusion**: The SSOT pattern works well when there are **many redundant computations** across multiple code paths, but **fails** when the original path already has **efficient early exits**.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Decision
|
||
|
|
|
||
|
|
**NO-GO**: The SSOT approach introduces a slight regression (-0.46%) and does not provide the expected benefits. The implementation will remain **OFF** (default `HAKMEM_ALLOC_PASSDOWN_SSOT=0`), and the code will be kept as a research box for future reference.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Recommendations for Future Phases
|
||
|
|
|
||
|
|
1. **Focus on Hot Functions**: Continue profiling to identify the next hottest allocation functions (e.g., `tiny_region_id_write_header`, `unified_cache_push`).
|
||
|
|
2. **Avoid Upfront Computation**: For allocation paths with early exits, avoid forcing upfront computation. Instead, optimize the early-exit paths directly.
|
||
|
|
3. **Branch Reduction**: Investigate removing branches from the hot path (e.g., `if (class_idx == 7 && c7_ultra_on)`).
|
||
|
|
4. **Inline Critical Functions**: Ensure critical functions like `tiny_c7_ultra_alloc` are always inlined to reduce call overhead.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Box Theory Compliance
|
||
|
|
|
||
|
|
- **Single Conversion Point**: The SSOT entry point computes the context once (compliant).
|
||
|
|
- **Clear Boundaries**: The `alloc_passdown_context_t` struct defines the boundary (compliant).
|
||
|
|
- **Reversible**: The `HAKMEM_ALLOC_PASSDOWN_SSOT` ENV gate allows rollback (compliant).
|
||
|
|
- **Performance**: The approach did **not** improve performance (non-compliant).
|
||
|
|
|
||
|
|
**Verdict**: Box Theory compliant, but **performance non-compliant** (NO-GO).
|