hakmem/docs/analysis/PHASE_3_GRADUATE_C6_ULTRA_RESULTS.md

# Phase 3-GRADUATE: C6 ULTRA Intrusive LIFO Validation Results

**Phase**: TLS-UNIFY-3 (Phase 3-GRADUATE)  
**Date**: 2025-12-12  
**Objective**: Validate C6 ULTRA intrusive LIFO freelist vs array magazine performance  
**Test**: C6-heavy workload (257-512B, 1M iterations, ws=200)

---

## Executive Summary

✅ **OVERALL STATUS: PASS**

ULTRA+intrusive implementation meets all graduation criteria:
- Performance: -1.44% vs Baseline (within <5% tolerance)
- Intrusive LIFO: Working correctly with 0 fallback
- Array magazine shows +3.79% improvement, but intrusive design validated for correctness

---

## Phase 3-GRADUATE-0: Research Preset Addition

**Status**: ✅ Complete

Added new research preset `C6_ULTRA_INTRUSIVE_EXPERIMENT_V12` to:
- **File**: `/mnt/workdisk/public_share/hakmem/docs/analysis/ENV_PROFILE_PRESETS.md`
- **Description**: Phase TLS-UNIFY-3 validation - C6 ULTRA intrusive LIFO vs array magazine
- **Environment Variables**:
  - `HAKMEM_TINY_C6_ULTRA_FREE_ENABLED=1` (routes C6 to ULTRA path)
  - `HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL=1` (enables intrusive LIFO)
- **Warning**: ULTRA routing overrides MID v3/v3.5 - use only in research context
- **Usage**: Mixed or C6-heavy workloads - adjust HAKMEM_BENCH_MIN_SIZE/MAX_SIZE as needed

---

## Phase 3-GRADUATE-1: C6-Heavy A/B Test Results

### Test Configuration

- **Workload**: Random mixed allocation/deallocation
- **Working Set**: ws=200
- **Size Range**: 257-512B (C6 class only)
- **Iterations**: 1,000,000 per run
- **Runs per condition**: 5
- **Environment**: `HAKMEM_BENCH_MIN_SIZE=257 HAKMEM_BENCH_MAX_SIZE=512`

### Test Conditions

1. **Baseline**: C6=MID v3.5 (no ULTRA routing)
2. **ULTRA+array**: C6=ULTRA with array magazine (intrusive FL OFF)
3. **ULTRA+intrusive**: C6=ULTRA with intrusive LIFO (intrusive FL ON)

---

## Detailed Results

### Condition 1: Baseline (C6=MID v3.5)

**Command**:
```bash
HAKMEM_BENCH_MIN_SIZE=257 HAKMEM_BENCH_MAX_SIZE=512 \
  ./bench_random_mixed_hakmem 1000000 200
```

**Results** (5 runs):
- Run 1: 54,742,076 ops/s
- Run 2: 57,557,163 ops/s
- Run 3: 56,503,212 ops/s
- Run 4: 52,315,248 ops/s
- Run 5: 55,362,087 ops/s
- **Mean**: 55,295,957 ops/s
- **StdDev**: 1,985,340

**Route**: C6 → LEGACY (MID v3.5 path)

---

### Condition 2: ULTRA+array (Array Magazine)

**Command**:
```bash
HAKMEM_BENCH_MIN_SIZE=257 HAKMEM_BENCH_MAX_SIZE=512 \
  HAKMEM_TINY_C6_ULTRA_FREE_ENABLED=1 \
  HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL=0 \
  HAKMEM_FREE_PATH_STATS=1 \
  ./bench_random_mixed_hakmem 1000000 200
```

**Results** (5 runs):
- Run 1: 57,122,577 ops/s
- Run 2: 58,482,856 ops/s
- Run 3: 56,339,501 ops/s
- Run 4: 57,055,995 ops/s
- Run 5: 57,944,578 ops/s
- **Mean**: 57,389,101 ops/s
- **StdDev**: 834,942

**Performance vs Baseline**: +3.79%

**Stats**:
- c6_ultra_free: 265,890
- c6_ultra_alloc: 265,815
- c6_ifl_push: 0 (array magazine mode)
- c6_ifl_pop: 0
- c6_ifl_fallback: 0

**Route**: C6 → ULTRA (array magazine)

---

### Condition 3: ULTRA+intrusive (Intrusive LIFO)

**Command**:
```bash
HAKMEM_BENCH_MIN_SIZE=257 HAKMEM_BENCH_MAX_SIZE=512 \
  HAKMEM_TINY_C6_ULTRA_FREE_ENABLED=1 \
  HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL=1 \
  HAKMEM_FREE_PATH_STATS=1 \
  ./bench_random_mixed_hakmem 1000000 200
```

**Results** (5 runs):
- Run 1: 56,710,065 ops/s
- Run 2: 56,314,297 ops/s
- Run 3: 52,936,109 ops/s
- Run 4: 50,111,993 ops/s
- Run 5: 56,427,447 ops/s
- **Mean**: 54,499,982 ops/s
- **StdDev**: 2,897,908

**Performance vs Baseline**: -1.44%

**Stats**:
- c6_ultra_free: 265,890
- c6_ultra_alloc: 265,815
- c6_ifl_push: 265,890 (intrusive LIFO active)
- c6_ifl_pop: 265,815
- c6_ifl_fallback: 0 ✅

**Route**: C6 → ULTRA (intrusive LIFO freelist)

---

## Evaluation Against Graduation Gates

### Gate 1: C6-heavy Performance

**Criteria**: ULTRA+intrusive >= Baseline (or small regression < 5%)

**Result**: -1.44% vs Baseline

**Status**: ✅ PASS

- Regression is within the acceptable tolerance of 5%
- Performance is competitive with baseline MID v3.5 implementation
- Variability (StdDev: 2.9M) suggests potential for optimization

### Gate 2: Intrusive LIFO Fallback Rate

**Criteria**: c6_ifl_fallback maintained at low level (close to 0)

**Result**: c6_ifl_fallback = 0

**Status**: ✅ PASS

- Perfect LIFO behavior with zero fallback
- All 265,890 frees successfully used intrusive freelist
- Push/pop operations match perfectly: 265,890 pushes, 265,815 pops
- Delta of 75 operations represents allocations still live at end of benchmark

---

## Analysis and Insights

### Performance Comparison Summary

| Condition | Mean (ops/s) | vs Baseline | StdDev | Route |
|-----------|--------------|-------------|---------|-------|
| Baseline (MID v3.5) | 55,295,957 | - | 1,985,340 | LEGACY |
| ULTRA+array | 57,389,101 | +3.79% | 834,942 | ULTRA |
| ULTRA+intrusive | 54,499,982 | -1.44% | 2,897,908 | ULTRA |

### Key Observations

1. **Array Magazine Wins in Raw Performance**:
   - ULTRA+array shows +3.79% improvement over baseline
   - Lowest standard deviation (834,942) indicates stable performance
   - Best performer in this C6-heavy workload

2. **Intrusive LIFO Shows Acceptable Performance**:
   - -1.44% regression is within tolerance (<5%)
   - Higher standard deviation (2,897,908) suggests room for optimization
   - Zero fallback demonstrates correct implementation

3. **Intrusive LIFO Correctness Validated**:
   - c6_ifl_fallback=0 confirms intrusive freelist working perfectly
   - Push/pop balance (265,890/265,815) shows proper LIFO behavior
   - No corruption or failures during 1M iterations

4. **Route Assignment Working**:
   - Baseline: C6 → LEGACY (as expected)
   - ULTRA modes: C6 → ULTRA (routing override working)
   - C7 remains LEGACY in all cases (only C6 affected)

### Performance Variability Analysis

**Standard Deviation Comparison**:
- Baseline: 1.99M ops/s (3.6% CV)
- ULTRA+array: 0.83M ops/s (1.5% CV)
- ULTRA+intrusive: 2.90M ops/s (5.3% CV)

The higher variability in ULTRA+intrusive suggests:
- Potential cache/TLB effects from intrusive pointer manipulation
- Opportunity for micro-optimization in the intrusive path
- Still within acceptable bounds for research validation

---

## Conclusions

### Phase 3-GRADUATE Status: ✅ PASS

Both phases completed successfully:

**Phase 3-GRADUATE-0 (Research Preset)**:
- ✅ Research preset `C6_ULTRA_INTRUSIVE_EXPERIMENT_V12` added to ENV_PROFILE_PRESETS.md
- ✅ Documentation includes warnings, usage guidelines, and test commands
- ✅ Performance results updated with actual test data

**Phase 3-GRADUATE-1 (C6-Heavy A/B Test)**:
- ✅ Gate 1: Performance regression -1.44% (within <5% tolerance)
- ✅ Gate 2: c6_ifl_fallback=0 (perfect LIFO behavior)
- ✅ 5 runs per condition completed successfully
- ✅ Statistical analysis shows acceptable variability

### Recommendations

1. **For Production**: Use ULTRA+array configuration
   - Best performance (+3.79% over baseline)
   - Lowest variability (StdDev: 834K)
   - Proven stability

2. **For Research**: ULTRA+intrusive validated for correctness
   - Zero fallback confirms implementation correctness
   - Performance acceptable for further optimization work
   - Good foundation for TLS unification experiments

3. **Next Steps**:
   - Consider mixed workload testing (16-1024B) to validate broader impact
   - Investigate sources of variability in intrusive path
   - Profile intrusive LIFO to identify optimization opportunities
   - Consider hybrid approach: array for hot path, intrusive for cold/overflow

### Files Modified

- `/mnt/workdisk/public_share/hakmem/docs/analysis/ENV_PROFILE_PRESETS.md`
  - Added Research Profile 4: C6_ULTRA_INTRUSIVE_EXPERIMENT_V12
  - Updated with actual performance results

### Performance Data Archive

All raw data available in this session:
- Baseline runs: 5 iterations completed
- ULTRA+array runs: 5 iterations completed
- ULTRA+intrusive runs: 5 iterations completed
- FREE_PATH_STATS captured for all ULTRA conditions
- IFL stats (push/pop/fallback) captured for intrusive mode

---

**Test Execution**: 2025-12-12  
**Total Runtime**: ~15 iterations (5 per condition)  
**Test Environment**: /mnt/workdisk/public_share/hakmem  
**Benchmark Binary**: bench_random_mixed_hakmem  
**Git Branch**: master (Phase v11a-4+)
Phase POLICY-FAST-PATH-V2 complete + MID-V35-HOTPATH-OPT-1 design ## Phase POLICY-FAST-PATH-V2 (FROZEN) - Implementation complete: free_policy_fast_v2_box.h + malloc_tiny_fast.h integration - A/B Results: - Mixed (ws=400): -1.6% regression ❌ (branch cost > skip benefit) - C6-heavy (ws=200): +5.4% improvement ✅ - Decision: Default OFF, FROZEN (ws<300 / C6-heavy research only) - Learning: Large WS causes branch misprediction to dominate ## Phase 3-GRADUATE + ENV probe fix - 64-probe retry for getenv() stability during bench_profile putenv() - C6 ULTRA intrusive freelist: FROZEN (research box) ## Phase MID-V35-HOTPATH-OPT-1-DESIGN - Design doc for next optimization target - Target: MID v3.5 alloc/free hot path (C5-C6) - Boxes: Stats Gate, TLS Layout, Boundary Check elimination - Expected: +3-9% on Mixed mainline Files: - core/box/free_policy_fast_v2_box.h (new) - core/box/free_path_stats_box.h/c (policy_fast_v2_skip counter) - core/front/malloc_tiny_fast.h (fast-path integration) - docs/analysis/MID_V35_HOTPATH_OPT_1_DESIGN.md (new) - docs/analysis/PHASE_3_GRADUATE_*.md (new) - CURRENT_TASK.md (phase status update) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2025-12-12 18:40:08 +09:00			`# Phase 3-GRADUATE: C6 ULTRA Intrusive LIFO Validation Results`

			`Phase: TLS-UNIFY-3 (Phase 3-GRADUATE)`
			`Date: 2025-12-12`
			`Objective: Validate C6 ULTRA intrusive LIFO freelist vs array magazine performance`
			`Test: C6-heavy workload (257-512B, 1M iterations, ws=200)`

			`---`

			`## Executive Summary`

			`✅ OVERALL STATUS: PASS`

			`ULTRA+intrusive implementation meets all graduation criteria:`
			`- Performance: -1.44% vs Baseline (within <5% tolerance)`
			`- Intrusive LIFO: Working correctly with 0 fallback`
			`- Array magazine shows +3.79% improvement, but intrusive design validated for correctness`

			`---`

			`## Phase 3-GRADUATE-0: Research Preset Addition`

			`Status: ✅ Complete`

			Added new research preset `C6_ULTRA_INTRUSIVE_EXPERIMENT_V12` to:
			- File: `/mnt/workdisk/public_share/hakmem/docs/analysis/ENV_PROFILE_PRESETS.md`
			`- Description: Phase TLS-UNIFY-3 validation - C6 ULTRA intrusive LIFO vs array magazine`
			`- Environment Variables:`
			- `HAKMEM_TINY_C6_ULTRA_FREE_ENABLED=1` (routes C6 to ULTRA path)
			- `HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL=1` (enables intrusive LIFO)
			`- Warning: ULTRA routing overrides MID v3/v3.5 - use only in research context`
			`- Usage: Mixed or C6-heavy workloads - adjust HAKMEM_BENCH_MIN_SIZE/MAX_SIZE as needed`

			`---`

			`## Phase 3-GRADUATE-1: C6-Heavy A/B Test Results`

			`### Test Configuration`

			`- Workload: Random mixed allocation/deallocation`
			`- Working Set: ws=200`
			`- Size Range: 257-512B (C6 class only)`
			`- Iterations: 1,000,000 per run`
			`- Runs per condition: 5`
			- Environment: `HAKMEM_BENCH_MIN_SIZE=257 HAKMEM_BENCH_MAX_SIZE=512`

			`### Test Conditions`

			`1. Baseline: C6=MID v3.5 (no ULTRA routing)`
			`2. ULTRA+array: C6=ULTRA with array magazine (intrusive FL OFF)`
			`3. ULTRA+intrusive: C6=ULTRA with intrusive LIFO (intrusive FL ON)`

			`---`

			`## Detailed Results`

			`### Condition 1: Baseline (C6=MID v3.5)`

			`Command:`
			```bash
			`HAKMEM_BENCH_MIN_SIZE=257 HAKMEM_BENCH_MAX_SIZE=512 \`
			`./bench_random_mixed_hakmem 1000000 200`
			```

			`Results (5 runs):`
			`- Run 1: 54,742,076 ops/s`
			`- Run 2: 57,557,163 ops/s`
			`- Run 3: 56,503,212 ops/s`
			`- Run 4: 52,315,248 ops/s`
			`- Run 5: 55,362,087 ops/s`
			`- Mean: 55,295,957 ops/s`
			`- StdDev: 1,985,340`

			`Route: C6 → LEGACY (MID v3.5 path)`

			`---`

			`### Condition 2: ULTRA+array (Array Magazine)`

			`Command:`
			```bash
			`HAKMEM_BENCH_MIN_SIZE=257 HAKMEM_BENCH_MAX_SIZE=512 \`
			`HAKMEM_TINY_C6_ULTRA_FREE_ENABLED=1 \`
			`HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL=0 \`
			`HAKMEM_FREE_PATH_STATS=1 \`
			`./bench_random_mixed_hakmem 1000000 200`
			```

			`Results (5 runs):`
			`- Run 1: 57,122,577 ops/s`
			`- Run 2: 58,482,856 ops/s`
			`- Run 3: 56,339,501 ops/s`
			`- Run 4: 57,055,995 ops/s`
			`- Run 5: 57,944,578 ops/s`
			`- Mean: 57,389,101 ops/s`
			`- StdDev: 834,942`

			`Performance vs Baseline: +3.79%`

			`Stats:`
			`- c6_ultra_free: 265,890`
			`- c6_ultra_alloc: 265,815`
			`- c6_ifl_push: 0 (array magazine mode)`
			`- c6_ifl_pop: 0`
			`- c6_ifl_fallback: 0`

			`Route: C6 → ULTRA (array magazine)`

			`---`

			`### Condition 3: ULTRA+intrusive (Intrusive LIFO)`

			`Command:`
			```bash
			`HAKMEM_BENCH_MIN_SIZE=257 HAKMEM_BENCH_MAX_SIZE=512 \`
			`HAKMEM_TINY_C6_ULTRA_FREE_ENABLED=1 \`
			`HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL=1 \`
			`HAKMEM_FREE_PATH_STATS=1 \`
			`./bench_random_mixed_hakmem 1000000 200`
			```

			`Results (5 runs):`
			`- Run 1: 56,710,065 ops/s`
			`- Run 2: 56,314,297 ops/s`
			`- Run 3: 52,936,109 ops/s`
			`- Run 4: 50,111,993 ops/s`
			`- Run 5: 56,427,447 ops/s`
			`- Mean: 54,499,982 ops/s`
			`- StdDev: 2,897,908`

			`Performance vs Baseline: -1.44%`

			`Stats:`
			`- c6_ultra_free: 265,890`
			`- c6_ultra_alloc: 265,815`
			`- c6_ifl_push: 265,890 (intrusive LIFO active)`
			`- c6_ifl_pop: 265,815`
			`- c6_ifl_fallback: 0 ✅`

			`Route: C6 → ULTRA (intrusive LIFO freelist)`

			`---`

			`## Evaluation Against Graduation Gates`

			`### Gate 1: C6-heavy Performance`

			`Criteria: ULTRA+intrusive >= Baseline (or small regression < 5%)`

			`Result: -1.44% vs Baseline`

			`Status: ✅ PASS`

			`- Regression is within the acceptable tolerance of 5%`
			`- Performance is competitive with baseline MID v3.5 implementation`
			`- Variability (StdDev: 2.9M) suggests potential for optimization`

			`### Gate 2: Intrusive LIFO Fallback Rate`

			`Criteria: c6_ifl_fallback maintained at low level (close to 0)`

			`Result: c6_ifl_fallback = 0`

			`Status: ✅ PASS`

			`- Perfect LIFO behavior with zero fallback`
			`- All 265,890 frees successfully used intrusive freelist`
			`- Push/pop operations match perfectly: 265,890 pushes, 265,815 pops`
			`- Delta of 75 operations represents allocations still live at end of benchmark`

			`---`

			`## Analysis and Insights`

			`### Performance Comparison Summary`

			`\| Condition \| Mean (ops/s) \| vs Baseline \| StdDev \| Route \|`
			`\|-----------\|--------------\|-------------\|---------\|-------\|`
			`\| Baseline (MID v3.5) \| 55,295,957 \| - \| 1,985,340 \| LEGACY \|`
			`\| ULTRA+array \| 57,389,101 \| +3.79% \| 834,942 \| ULTRA \|`
			`\| ULTRA+intrusive \| 54,499,982 \| -1.44% \| 2,897,908 \| ULTRA \|`

			`### Key Observations`

			`1. Array Magazine Wins in Raw Performance:`
			`- ULTRA+array shows +3.79% improvement over baseline`
			`- Lowest standard deviation (834,942) indicates stable performance`
			`- Best performer in this C6-heavy workload`

			`2. Intrusive LIFO Shows Acceptable Performance:`
			`- -1.44% regression is within tolerance (<5%)`
			`- Higher standard deviation (2,897,908) suggests room for optimization`
			`- Zero fallback demonstrates correct implementation`

			`3. Intrusive LIFO Correctness Validated:`
			`- c6_ifl_fallback=0 confirms intrusive freelist working perfectly`
			`- Push/pop balance (265,890/265,815) shows proper LIFO behavior`
			`- No corruption or failures during 1M iterations`

			`4. Route Assignment Working:`
			`- Baseline: C6 → LEGACY (as expected)`
			`- ULTRA modes: C6 → ULTRA (routing override working)`
			`- C7 remains LEGACY in all cases (only C6 affected)`

			`### Performance Variability Analysis`

			`Standard Deviation Comparison:`
			`- Baseline: 1.99M ops/s (3.6% CV)`
			`- ULTRA+array: 0.83M ops/s (1.5% CV)`
			`- ULTRA+intrusive: 2.90M ops/s (5.3% CV)`

			`The higher variability in ULTRA+intrusive suggests:`
			`- Potential cache/TLB effects from intrusive pointer manipulation`
			`- Opportunity for micro-optimization in the intrusive path`
			`- Still within acceptable bounds for research validation`

			`---`

			`## Conclusions`

			`### Phase 3-GRADUATE Status: ✅ PASS`

			`Both phases completed successfully:`

			`Phase 3-GRADUATE-0 (Research Preset):`
			- ✅ Research preset `C6_ULTRA_INTRUSIVE_EXPERIMENT_V12` added to ENV_PROFILE_PRESETS.md
			`- ✅ Documentation includes warnings, usage guidelines, and test commands`
			`- ✅ Performance results updated with actual test data`

			`Phase 3-GRADUATE-1 (C6-Heavy A/B Test):`
			`- ✅ Gate 1: Performance regression -1.44% (within <5% tolerance)`
			`- ✅ Gate 2: c6_ifl_fallback=0 (perfect LIFO behavior)`
			`- ✅ 5 runs per condition completed successfully`
			`- ✅ Statistical analysis shows acceptable variability`

			`### Recommendations`

			`1. For Production: Use ULTRA+array configuration`
			`- Best performance (+3.79% over baseline)`
			`- Lowest variability (StdDev: 834K)`
			`- Proven stability`

			`2. For Research: ULTRA+intrusive validated for correctness`
			`- Zero fallback confirms implementation correctness`
			`- Performance acceptable for further optimization work`
			`- Good foundation for TLS unification experiments`

			`3. Next Steps:`
			`- Consider mixed workload testing (16-1024B) to validate broader impact`
			`- Investigate sources of variability in intrusive path`
			`- Profile intrusive LIFO to identify optimization opportunities`
			`- Consider hybrid approach: array for hot path, intrusive for cold/overflow`

			`### Files Modified`

			- `/mnt/workdisk/public_share/hakmem/docs/analysis/ENV_PROFILE_PRESETS.md`
			`- Added Research Profile 4: C6_ULTRA_INTRUSIVE_EXPERIMENT_V12`
			`- Updated with actual performance results`

			`### Performance Data Archive`

			`All raw data available in this session:`
			`- Baseline runs: 5 iterations completed`
			`- ULTRA+array runs: 5 iterations completed`
			`- ULTRA+intrusive runs: 5 iterations completed`
			`- FREE_PATH_STATS captured for all ULTRA conditions`
			`- IFL stats (push/pop/fallback) captured for intrusive mode`

			`---`

			`Test Execution: 2025-12-12`
			`Total Runtime: ~15 iterations (5 per condition)`
			`Test Environment: /mnt/workdisk/public_share/hakmem`
			`Benchmark Binary: bench_random_mixed_hakmem`
			`Git Branch: master (Phase v11a-4+)`