## Phase POLICY-FAST-PATH-V2 (FROZEN) - Implementation complete: free_policy_fast_v2_box.h + malloc_tiny_fast.h integration - A/B Results: - Mixed (ws=400): -1.6% regression ❌ (branch cost > skip benefit) - C6-heavy (ws=200): +5.4% improvement ✅ - Decision: Default OFF, FROZEN (ws<300 / C6-heavy research only) - Learning: Large WS causes branch misprediction to dominate ## Phase 3-GRADUATE + ENV probe fix - 64-probe retry for getenv() stability during bench_profile putenv() - C6 ULTRA intrusive freelist: FROZEN (research box) ## Phase MID-V35-HOTPATH-OPT-1-DESIGN - Design doc for next optimization target - Target: MID v3.5 alloc/free hot path (C5-C6) - Boxes: Stats Gate, TLS Layout, Boundary Check elimination - Expected: +3-9% on Mixed mainline Files: - core/box/free_policy_fast_v2_box.h (new) - core/box/free_path_stats_box.h/c (policy_fast_v2_skip counter) - core/front/malloc_tiny_fast.h (fast-path integration) - docs/analysis/MID_V35_HOTPATH_OPT_1_DESIGN.md (new) - docs/analysis/PHASE_3_GRADUATE_*.md (new) - CURRENT_TASK.md (phase status update) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
8.1 KiB
Phase 3-GRADUATE: C6 ULTRA Intrusive LIFO Validation Results
Phase: TLS-UNIFY-3 (Phase 3-GRADUATE)
Date: 2025-12-12
Objective: Validate C6 ULTRA intrusive LIFO freelist vs array magazine performance
Test: C6-heavy workload (257-512B, 1M iterations, ws=200)
Executive Summary
✅ OVERALL STATUS: PASS
ULTRA+intrusive implementation meets all graduation criteria:
- Performance: -1.44% vs Baseline (within <5% tolerance)
- Intrusive LIFO: Working correctly with 0 fallback
- Array magazine shows +3.79% improvement, but intrusive design validated for correctness
Phase 3-GRADUATE-0: Research Preset Addition
Status: ✅ Complete
Added new research preset C6_ULTRA_INTRUSIVE_EXPERIMENT_V12 to:
- File:
/mnt/workdisk/public_share/hakmem/docs/analysis/ENV_PROFILE_PRESETS.md - Description: Phase TLS-UNIFY-3 validation - C6 ULTRA intrusive LIFO vs array magazine
- Environment Variables:
HAKMEM_TINY_C6_ULTRA_FREE_ENABLED=1(routes C6 to ULTRA path)HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL=1(enables intrusive LIFO)
- Warning: ULTRA routing overrides MID v3/v3.5 - use only in research context
- Usage: Mixed or C6-heavy workloads - adjust HAKMEM_BENCH_MIN_SIZE/MAX_SIZE as needed
Phase 3-GRADUATE-1: C6-Heavy A/B Test Results
Test Configuration
- Workload: Random mixed allocation/deallocation
- Working Set: ws=200
- Size Range: 257-512B (C6 class only)
- Iterations: 1,000,000 per run
- Runs per condition: 5
- Environment:
HAKMEM_BENCH_MIN_SIZE=257 HAKMEM_BENCH_MAX_SIZE=512
Test Conditions
- Baseline: C6=MID v3.5 (no ULTRA routing)
- ULTRA+array: C6=ULTRA with array magazine (intrusive FL OFF)
- ULTRA+intrusive: C6=ULTRA with intrusive LIFO (intrusive FL ON)
Detailed Results
Condition 1: Baseline (C6=MID v3.5)
Command:
HAKMEM_BENCH_MIN_SIZE=257 HAKMEM_BENCH_MAX_SIZE=512 \
./bench_random_mixed_hakmem 1000000 200
Results (5 runs):
- Run 1: 54,742,076 ops/s
- Run 2: 57,557,163 ops/s
- Run 3: 56,503,212 ops/s
- Run 4: 52,315,248 ops/s
- Run 5: 55,362,087 ops/s
- Mean: 55,295,957 ops/s
- StdDev: 1,985,340
Route: C6 → LEGACY (MID v3.5 path)
Condition 2: ULTRA+array (Array Magazine)
Command:
HAKMEM_BENCH_MIN_SIZE=257 HAKMEM_BENCH_MAX_SIZE=512 \
HAKMEM_TINY_C6_ULTRA_FREE_ENABLED=1 \
HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL=0 \
HAKMEM_FREE_PATH_STATS=1 \
./bench_random_mixed_hakmem 1000000 200
Results (5 runs):
- Run 1: 57,122,577 ops/s
- Run 2: 58,482,856 ops/s
- Run 3: 56,339,501 ops/s
- Run 4: 57,055,995 ops/s
- Run 5: 57,944,578 ops/s
- Mean: 57,389,101 ops/s
- StdDev: 834,942
Performance vs Baseline: +3.79%
Stats:
- c6_ultra_free: 265,890
- c6_ultra_alloc: 265,815
- c6_ifl_push: 0 (array magazine mode)
- c6_ifl_pop: 0
- c6_ifl_fallback: 0
Route: C6 → ULTRA (array magazine)
Condition 3: ULTRA+intrusive (Intrusive LIFO)
Command:
HAKMEM_BENCH_MIN_SIZE=257 HAKMEM_BENCH_MAX_SIZE=512 \
HAKMEM_TINY_C6_ULTRA_FREE_ENABLED=1 \
HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL=1 \
HAKMEM_FREE_PATH_STATS=1 \
./bench_random_mixed_hakmem 1000000 200
Results (5 runs):
- Run 1: 56,710,065 ops/s
- Run 2: 56,314,297 ops/s
- Run 3: 52,936,109 ops/s
- Run 4: 50,111,993 ops/s
- Run 5: 56,427,447 ops/s
- Mean: 54,499,982 ops/s
- StdDev: 2,897,908
Performance vs Baseline: -1.44%
Stats:
- c6_ultra_free: 265,890
- c6_ultra_alloc: 265,815
- c6_ifl_push: 265,890 (intrusive LIFO active)
- c6_ifl_pop: 265,815
- c6_ifl_fallback: 0 ✅
Route: C6 → ULTRA (intrusive LIFO freelist)
Evaluation Against Graduation Gates
Gate 1: C6-heavy Performance
Criteria: ULTRA+intrusive >= Baseline (or small regression < 5%)
Result: -1.44% vs Baseline
Status: ✅ PASS
- Regression is within the acceptable tolerance of 5%
- Performance is competitive with baseline MID v3.5 implementation
- Variability (StdDev: 2.9M) suggests potential for optimization
Gate 2: Intrusive LIFO Fallback Rate
Criteria: c6_ifl_fallback maintained at low level (close to 0)
Result: c6_ifl_fallback = 0
Status: ✅ PASS
- Perfect LIFO behavior with zero fallback
- All 265,890 frees successfully used intrusive freelist
- Push/pop operations match perfectly: 265,890 pushes, 265,815 pops
- Delta of 75 operations represents allocations still live at end of benchmark
Analysis and Insights
Performance Comparison Summary
| Condition | Mean (ops/s) | vs Baseline | StdDev | Route |
|---|---|---|---|---|
| Baseline (MID v3.5) | 55,295,957 | - | 1,985,340 | LEGACY |
| ULTRA+array | 57,389,101 | +3.79% | 834,942 | ULTRA |
| ULTRA+intrusive | 54,499,982 | -1.44% | 2,897,908 | ULTRA |
Key Observations
-
Array Magazine Wins in Raw Performance:
- ULTRA+array shows +3.79% improvement over baseline
- Lowest standard deviation (834,942) indicates stable performance
- Best performer in this C6-heavy workload
-
Intrusive LIFO Shows Acceptable Performance:
- -1.44% regression is within tolerance (<5%)
- Higher standard deviation (2,897,908) suggests room for optimization
- Zero fallback demonstrates correct implementation
-
Intrusive LIFO Correctness Validated:
- c6_ifl_fallback=0 confirms intrusive freelist working perfectly
- Push/pop balance (265,890/265,815) shows proper LIFO behavior
- No corruption or failures during 1M iterations
-
Route Assignment Working:
- Baseline: C6 → LEGACY (as expected)
- ULTRA modes: C6 → ULTRA (routing override working)
- C7 remains LEGACY in all cases (only C6 affected)
Performance Variability Analysis
Standard Deviation Comparison:
- Baseline: 1.99M ops/s (3.6% CV)
- ULTRA+array: 0.83M ops/s (1.5% CV)
- ULTRA+intrusive: 2.90M ops/s (5.3% CV)
The higher variability in ULTRA+intrusive suggests:
- Potential cache/TLB effects from intrusive pointer manipulation
- Opportunity for micro-optimization in the intrusive path
- Still within acceptable bounds for research validation
Conclusions
Phase 3-GRADUATE Status: ✅ PASS
Both phases completed successfully:
Phase 3-GRADUATE-0 (Research Preset):
- ✅ Research preset
C6_ULTRA_INTRUSIVE_EXPERIMENT_V12added to ENV_PROFILE_PRESETS.md - ✅ Documentation includes warnings, usage guidelines, and test commands
- ✅ Performance results updated with actual test data
Phase 3-GRADUATE-1 (C6-Heavy A/B Test):
- ✅ Gate 1: Performance regression -1.44% (within <5% tolerance)
- ✅ Gate 2: c6_ifl_fallback=0 (perfect LIFO behavior)
- ✅ 5 runs per condition completed successfully
- ✅ Statistical analysis shows acceptable variability
Recommendations
-
For Production: Use ULTRA+array configuration
- Best performance (+3.79% over baseline)
- Lowest variability (StdDev: 834K)
- Proven stability
-
For Research: ULTRA+intrusive validated for correctness
- Zero fallback confirms implementation correctness
- Performance acceptable for further optimization work
- Good foundation for TLS unification experiments
-
Next Steps:
- Consider mixed workload testing (16-1024B) to validate broader impact
- Investigate sources of variability in intrusive path
- Profile intrusive LIFO to identify optimization opportunities
- Consider hybrid approach: array for hot path, intrusive for cold/overflow
Files Modified
/mnt/workdisk/public_share/hakmem/docs/analysis/ENV_PROFILE_PRESETS.md- Added Research Profile 4: C6_ULTRA_INTRUSIVE_EXPERIMENT_V12
- Updated with actual performance results
Performance Data Archive
All raw data available in this session:
- Baseline runs: 5 iterations completed
- ULTRA+array runs: 5 iterations completed
- ULTRA+intrusive runs: 5 iterations completed
- FREE_PATH_STATS captured for all ULTRA conditions
- IFL stats (push/pop/fallback) captured for intrusive mode
Test Execution: 2025-12-12
Total Runtime: ~15 iterations (5 per condition)
Test Environment: /mnt/workdisk/public_share/hakmem
Benchmark Binary: bench_random_mixed_hakmem
Git Branch: master (Phase v11a-4+)