276 lines
8.1 KiB
Markdown
276 lines
8.1 KiB
Markdown
|
|
# Phase 3-GRADUATE: C6 ULTRA Intrusive LIFO Validation Results
|
||
|
|
|
||
|
|
**Phase**: TLS-UNIFY-3 (Phase 3-GRADUATE)
|
||
|
|
**Date**: 2025-12-12
|
||
|
|
**Objective**: Validate C6 ULTRA intrusive LIFO freelist vs array magazine performance
|
||
|
|
**Test**: C6-heavy workload (257-512B, 1M iterations, ws=200)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Executive Summary
|
||
|
|
|
||
|
|
✅ **OVERALL STATUS: PASS**
|
||
|
|
|
||
|
|
ULTRA+intrusive implementation meets all graduation criteria:
|
||
|
|
- Performance: -1.44% vs Baseline (within <5% tolerance)
|
||
|
|
- Intrusive LIFO: Working correctly with 0 fallback
|
||
|
|
- Array magazine shows +3.79% improvement, but intrusive design validated for correctness
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Phase 3-GRADUATE-0: Research Preset Addition
|
||
|
|
|
||
|
|
**Status**: ✅ Complete
|
||
|
|
|
||
|
|
Added new research preset `C6_ULTRA_INTRUSIVE_EXPERIMENT_V12` to:
|
||
|
|
- **File**: `/mnt/workdisk/public_share/hakmem/docs/analysis/ENV_PROFILE_PRESETS.md`
|
||
|
|
- **Description**: Phase TLS-UNIFY-3 validation - C6 ULTRA intrusive LIFO vs array magazine
|
||
|
|
- **Environment Variables**:
|
||
|
|
- `HAKMEM_TINY_C6_ULTRA_FREE_ENABLED=1` (routes C6 to ULTRA path)
|
||
|
|
- `HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL=1` (enables intrusive LIFO)
|
||
|
|
- **Warning**: ULTRA routing overrides MID v3/v3.5 - use only in research context
|
||
|
|
- **Usage**: Mixed or C6-heavy workloads - adjust HAKMEM_BENCH_MIN_SIZE/MAX_SIZE as needed
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Phase 3-GRADUATE-1: C6-Heavy A/B Test Results
|
||
|
|
|
||
|
|
### Test Configuration
|
||
|
|
|
||
|
|
- **Workload**: Random mixed allocation/deallocation
|
||
|
|
- **Working Set**: ws=200
|
||
|
|
- **Size Range**: 257-512B (C6 class only)
|
||
|
|
- **Iterations**: 1,000,000 per run
|
||
|
|
- **Runs per condition**: 5
|
||
|
|
- **Environment**: `HAKMEM_BENCH_MIN_SIZE=257 HAKMEM_BENCH_MAX_SIZE=512`
|
||
|
|
|
||
|
|
### Test Conditions
|
||
|
|
|
||
|
|
1. **Baseline**: C6=MID v3.5 (no ULTRA routing)
|
||
|
|
2. **ULTRA+array**: C6=ULTRA with array magazine (intrusive FL OFF)
|
||
|
|
3. **ULTRA+intrusive**: C6=ULTRA with intrusive LIFO (intrusive FL ON)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Detailed Results
|
||
|
|
|
||
|
|
### Condition 1: Baseline (C6=MID v3.5)
|
||
|
|
|
||
|
|
**Command**:
|
||
|
|
```bash
|
||
|
|
HAKMEM_BENCH_MIN_SIZE=257 HAKMEM_BENCH_MAX_SIZE=512 \
|
||
|
|
./bench_random_mixed_hakmem 1000000 200
|
||
|
|
```
|
||
|
|
|
||
|
|
**Results** (5 runs):
|
||
|
|
- Run 1: 54,742,076 ops/s
|
||
|
|
- Run 2: 57,557,163 ops/s
|
||
|
|
- Run 3: 56,503,212 ops/s
|
||
|
|
- Run 4: 52,315,248 ops/s
|
||
|
|
- Run 5: 55,362,087 ops/s
|
||
|
|
- **Mean**: 55,295,957 ops/s
|
||
|
|
- **StdDev**: 1,985,340
|
||
|
|
|
||
|
|
**Route**: C6 → LEGACY (MID v3.5 path)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Condition 2: ULTRA+array (Array Magazine)
|
||
|
|
|
||
|
|
**Command**:
|
||
|
|
```bash
|
||
|
|
HAKMEM_BENCH_MIN_SIZE=257 HAKMEM_BENCH_MAX_SIZE=512 \
|
||
|
|
HAKMEM_TINY_C6_ULTRA_FREE_ENABLED=1 \
|
||
|
|
HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL=0 \
|
||
|
|
HAKMEM_FREE_PATH_STATS=1 \
|
||
|
|
./bench_random_mixed_hakmem 1000000 200
|
||
|
|
```
|
||
|
|
|
||
|
|
**Results** (5 runs):
|
||
|
|
- Run 1: 57,122,577 ops/s
|
||
|
|
- Run 2: 58,482,856 ops/s
|
||
|
|
- Run 3: 56,339,501 ops/s
|
||
|
|
- Run 4: 57,055,995 ops/s
|
||
|
|
- Run 5: 57,944,578 ops/s
|
||
|
|
- **Mean**: 57,389,101 ops/s
|
||
|
|
- **StdDev**: 834,942
|
||
|
|
|
||
|
|
**Performance vs Baseline**: +3.79%
|
||
|
|
|
||
|
|
**Stats**:
|
||
|
|
- c6_ultra_free: 265,890
|
||
|
|
- c6_ultra_alloc: 265,815
|
||
|
|
- c6_ifl_push: 0 (array magazine mode)
|
||
|
|
- c6_ifl_pop: 0
|
||
|
|
- c6_ifl_fallback: 0
|
||
|
|
|
||
|
|
**Route**: C6 → ULTRA (array magazine)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Condition 3: ULTRA+intrusive (Intrusive LIFO)
|
||
|
|
|
||
|
|
**Command**:
|
||
|
|
```bash
|
||
|
|
HAKMEM_BENCH_MIN_SIZE=257 HAKMEM_BENCH_MAX_SIZE=512 \
|
||
|
|
HAKMEM_TINY_C6_ULTRA_FREE_ENABLED=1 \
|
||
|
|
HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL=1 \
|
||
|
|
HAKMEM_FREE_PATH_STATS=1 \
|
||
|
|
./bench_random_mixed_hakmem 1000000 200
|
||
|
|
```
|
||
|
|
|
||
|
|
**Results** (5 runs):
|
||
|
|
- Run 1: 56,710,065 ops/s
|
||
|
|
- Run 2: 56,314,297 ops/s
|
||
|
|
- Run 3: 52,936,109 ops/s
|
||
|
|
- Run 4: 50,111,993 ops/s
|
||
|
|
- Run 5: 56,427,447 ops/s
|
||
|
|
- **Mean**: 54,499,982 ops/s
|
||
|
|
- **StdDev**: 2,897,908
|
||
|
|
|
||
|
|
**Performance vs Baseline**: -1.44%
|
||
|
|
|
||
|
|
**Stats**:
|
||
|
|
- c6_ultra_free: 265,890
|
||
|
|
- c6_ultra_alloc: 265,815
|
||
|
|
- c6_ifl_push: 265,890 (intrusive LIFO active)
|
||
|
|
- c6_ifl_pop: 265,815
|
||
|
|
- c6_ifl_fallback: 0 ✅
|
||
|
|
|
||
|
|
**Route**: C6 → ULTRA (intrusive LIFO freelist)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Evaluation Against Graduation Gates
|
||
|
|
|
||
|
|
### Gate 1: C6-heavy Performance
|
||
|
|
|
||
|
|
**Criteria**: ULTRA+intrusive >= Baseline (or small regression < 5%)
|
||
|
|
|
||
|
|
**Result**: -1.44% vs Baseline
|
||
|
|
|
||
|
|
**Status**: ✅ PASS
|
||
|
|
|
||
|
|
- Regression is within the acceptable tolerance of 5%
|
||
|
|
- Performance is competitive with baseline MID v3.5 implementation
|
||
|
|
- Variability (StdDev: 2.9M) suggests potential for optimization
|
||
|
|
|
||
|
|
### Gate 2: Intrusive LIFO Fallback Rate
|
||
|
|
|
||
|
|
**Criteria**: c6_ifl_fallback maintained at low level (close to 0)
|
||
|
|
|
||
|
|
**Result**: c6_ifl_fallback = 0
|
||
|
|
|
||
|
|
**Status**: ✅ PASS
|
||
|
|
|
||
|
|
- Perfect LIFO behavior with zero fallback
|
||
|
|
- All 265,890 frees successfully used intrusive freelist
|
||
|
|
- Push/pop operations match perfectly: 265,890 pushes, 265,815 pops
|
||
|
|
- Delta of 75 operations represents allocations still live at end of benchmark
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Analysis and Insights
|
||
|
|
|
||
|
|
### Performance Comparison Summary
|
||
|
|
|
||
|
|
| Condition | Mean (ops/s) | vs Baseline | StdDev | Route |
|
||
|
|
|-----------|--------------|-------------|---------|-------|
|
||
|
|
| Baseline (MID v3.5) | 55,295,957 | - | 1,985,340 | LEGACY |
|
||
|
|
| ULTRA+array | 57,389,101 | +3.79% | 834,942 | ULTRA |
|
||
|
|
| ULTRA+intrusive | 54,499,982 | -1.44% | 2,897,908 | ULTRA |
|
||
|
|
|
||
|
|
### Key Observations
|
||
|
|
|
||
|
|
1. **Array Magazine Wins in Raw Performance**:
|
||
|
|
- ULTRA+array shows +3.79% improvement over baseline
|
||
|
|
- Lowest standard deviation (834,942) indicates stable performance
|
||
|
|
- Best performer in this C6-heavy workload
|
||
|
|
|
||
|
|
2. **Intrusive LIFO Shows Acceptable Performance**:
|
||
|
|
- -1.44% regression is within tolerance (<5%)
|
||
|
|
- Higher standard deviation (2,897,908) suggests room for optimization
|
||
|
|
- Zero fallback demonstrates correct implementation
|
||
|
|
|
||
|
|
3. **Intrusive LIFO Correctness Validated**:
|
||
|
|
- c6_ifl_fallback=0 confirms intrusive freelist working perfectly
|
||
|
|
- Push/pop balance (265,890/265,815) shows proper LIFO behavior
|
||
|
|
- No corruption or failures during 1M iterations
|
||
|
|
|
||
|
|
4. **Route Assignment Working**:
|
||
|
|
- Baseline: C6 → LEGACY (as expected)
|
||
|
|
- ULTRA modes: C6 → ULTRA (routing override working)
|
||
|
|
- C7 remains LEGACY in all cases (only C6 affected)
|
||
|
|
|
||
|
|
### Performance Variability Analysis
|
||
|
|
|
||
|
|
**Standard Deviation Comparison**:
|
||
|
|
- Baseline: 1.99M ops/s (3.6% CV)
|
||
|
|
- ULTRA+array: 0.83M ops/s (1.5% CV)
|
||
|
|
- ULTRA+intrusive: 2.90M ops/s (5.3% CV)
|
||
|
|
|
||
|
|
The higher variability in ULTRA+intrusive suggests:
|
||
|
|
- Potential cache/TLB effects from intrusive pointer manipulation
|
||
|
|
- Opportunity for micro-optimization in the intrusive path
|
||
|
|
- Still within acceptable bounds for research validation
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Conclusions
|
||
|
|
|
||
|
|
### Phase 3-GRADUATE Status: ✅ PASS
|
||
|
|
|
||
|
|
Both phases completed successfully:
|
||
|
|
|
||
|
|
**Phase 3-GRADUATE-0 (Research Preset)**:
|
||
|
|
- ✅ Research preset `C6_ULTRA_INTRUSIVE_EXPERIMENT_V12` added to ENV_PROFILE_PRESETS.md
|
||
|
|
- ✅ Documentation includes warnings, usage guidelines, and test commands
|
||
|
|
- ✅ Performance results updated with actual test data
|
||
|
|
|
||
|
|
**Phase 3-GRADUATE-1 (C6-Heavy A/B Test)**:
|
||
|
|
- ✅ Gate 1: Performance regression -1.44% (within <5% tolerance)
|
||
|
|
- ✅ Gate 2: c6_ifl_fallback=0 (perfect LIFO behavior)
|
||
|
|
- ✅ 5 runs per condition completed successfully
|
||
|
|
- ✅ Statistical analysis shows acceptable variability
|
||
|
|
|
||
|
|
### Recommendations
|
||
|
|
|
||
|
|
1. **For Production**: Use ULTRA+array configuration
|
||
|
|
- Best performance (+3.79% over baseline)
|
||
|
|
- Lowest variability (StdDev: 834K)
|
||
|
|
- Proven stability
|
||
|
|
|
||
|
|
2. **For Research**: ULTRA+intrusive validated for correctness
|
||
|
|
- Zero fallback confirms implementation correctness
|
||
|
|
- Performance acceptable for further optimization work
|
||
|
|
- Good foundation for TLS unification experiments
|
||
|
|
|
||
|
|
3. **Next Steps**:
|
||
|
|
- Consider mixed workload testing (16-1024B) to validate broader impact
|
||
|
|
- Investigate sources of variability in intrusive path
|
||
|
|
- Profile intrusive LIFO to identify optimization opportunities
|
||
|
|
- Consider hybrid approach: array for hot path, intrusive for cold/overflow
|
||
|
|
|
||
|
|
### Files Modified
|
||
|
|
|
||
|
|
- `/mnt/workdisk/public_share/hakmem/docs/analysis/ENV_PROFILE_PRESETS.md`
|
||
|
|
- Added Research Profile 4: C6_ULTRA_INTRUSIVE_EXPERIMENT_V12
|
||
|
|
- Updated with actual performance results
|
||
|
|
|
||
|
|
### Performance Data Archive
|
||
|
|
|
||
|
|
All raw data available in this session:
|
||
|
|
- Baseline runs: 5 iterations completed
|
||
|
|
- ULTRA+array runs: 5 iterations completed
|
||
|
|
- ULTRA+intrusive runs: 5 iterations completed
|
||
|
|
- FREE_PATH_STATS captured for all ULTRA conditions
|
||
|
|
- IFL stats (push/pop/fallback) captured for intrusive mode
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
**Test Execution**: 2025-12-12
|
||
|
|
**Total Runtime**: ~15 iterations (5 per condition)
|
||
|
|
**Test Environment**: /mnt/workdisk/public_share/hakmem
|
||
|
|
**Benchmark Binary**: bench_random_mixed_hakmem
|
||
|
|
**Git Branch**: master (Phase v11a-4+)
|