Files
hakmem/docs/analysis/PHASE59B_SPEED_FIRST_REBASE_RESULTS.md
Moe Charm (CI) ef8e2ab9b5 Phase 59b & 61: Speed-first Rebase + C7 ULTRA Header-Light Optimization
Phase 59b: Speed-first Mode Baseline Rebase
- Rebase on MIXED_TINYV3_C7_SAFE profile (Speed-first, no prewarm suppression)
- hakmem: 58.478 M ops/s (CV 2.52%)
- mimalloc: 120.979 M ops/s (CV 0.90%)
- Ratio: 48.34% of mimalloc (down from 49.13% Balanced mode in Phase 59)
- Reason for difference: Profile selection (Speed-first vs Balanced) and mimalloc environment variance
- Status: COMPLETE (measurement-only, zero code changes)

Phase 61: C7 ULTRA Header-Light Optimization Attempt
- Objective: Skip header write on C7 ULTRA alloc hit (write only on refill)
- Implementation: ENV gate HAKMEM_TINY_C7_ULTRA_HEADER_LIGHT (default OFF)
- Result: +0.31% (NEUTRAL, below +1.0% GO threshold)
  - Baseline: 59.543 M ops/s (CV 1.53%)
  - Treatment: 59.729 M ops/s (CV 2.66%)
- Root cause analysis:
  - tiny_region_id_write_header only 2.32% of time (lower than Phase 42 estimate 4.56%)
  - Header-light mode adds branch to hot path, negating write savings
  - Mixed workload dilutes C7-specific optimization effectiveness
  - Variance increased due to branch prediction variability
- Decision: Kept as research box with ENV gate (default OFF)
- Lesson: Workload-specific optimizations need careful verification with full workloads

Updated Documentation:
- PHASE59B_SPEED_FIRST_REBASE_RESULTS.md: Full measurement results and analysis
- PHASE61_C7_ULTRA_HEADER_LIGHT_RESULTS.md: A/B test results and root cause analysis
- PHASE61_C7_ULTRA_HEADER_LIGHT_IMPLEMENTATION.md: Implementation details and design
- CURRENT_TASK.md: Updated status and next phase planning (Phase 62)
- PERFORMANCE_TARGETS_SCORECARD.md: Updated baseline and M1 milestone status

M1 (50%) Milestone Status:
- Current: 48.34% (Speed-first profile)
- Gap: -1.66% (within measurement noise)
- Profile recommendation: Speed-first as canonical default for throughput focus

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-17 16:25:26 +09:00

2.7 KiB

Phase 59b: Speed-first Rebase Results

Date: 2025-12-17 Objective: Measure baseline with Speed-first mode (HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE) and update baseline ratio.


Background

Phase 59 used Balanced mode, but Phase 57's 60-minute soak test showed Speed-first mode wins across all metrics:

  • Throughput: Speed-first is higher (not -3.0% as previously recorded)
  • CV: Speed-first 1.58% < Balanced 5.38%
  • Tail p99: Speed-first 19.14 ns/op < Balanced 20.78 ns/op

This phase re-measures baseline with Speed-first as the canonical configuration.


Build Configuration

make clean
make bench_random_mixed_hakmem_minimal
make bench_random_mixed_mi

Profile: MIXED_TINYV3_C7_SAFE (Speed-first)


Results

HAKMEM (Speed-first, 10 runs)

Run  1: 59703498 ops/s
Run  2: 58304610 ops/s
Run  3: 57661940 ops/s
Run  4: 58971883 ops/s
Run  5: 54922424 ops/s
Run  6: 58840032 ops/s
Run  7: 59513137 ops/s
Run  8: 57656603 ops/s
Run  9: 59560261 ops/s
Run 10: 59641284 ops/s

Statistics:

  • Mean: 58,477,567 ops/s
  • Median: 58,876,007 ops/s
  • Min: 54,922,424 ops/s
  • Max: 59,703,498 ops/s
  • CV: 2.52%

mimalloc (10 runs)

Run  1: 121727781 ops/s
Run  2: 122378721 ops/s
Run  3: 120826927 ops/s
Run  4: 119288198 ops/s
Run  5: 121275784 ops/s
Run  6: 119825073 ops/s
Run  7: 120096029 ops/s
Run  8: 121769295 ops/s
Run  9: 120555258 ops/s
Run 10: 122051669 ops/s

Statistics:

  • Mean: 120,979,474 ops/s
  • Median: 120,966,493 ops/s
  • Min: 119,288,198 ops/s
  • Max: 122,378,721 ops/s
  • CV: 0.90%

Ratio Calculation

HAKMEM / mimalloc: 58,477,567 / 120,979,474 = 48.34%

Comparison with Phase 59

Metric Phase 59 (Balanced) Phase 59b (Speed-first) Delta
HAKMEM Mean 58,476,000 ops/s 58,477,567 ops/s +0.00%
mimalloc Mean 119,086,000 ops/s 120,979,474 ops/s +1.59%
Ratio 49.13% 48.34% -0.79pp

Note: Speed-first mode shows slightly lower ratio (-0.79pp) due to mimalloc improvement (+1.59%), not HAKMEM regression. HAKMEM throughput is identical.


Conclusion

Status: COMPLETED

Findings:

  1. Speed-first mode is the correct baseline (lower CV, better tail latency)
  2. New baseline ratio: 48.34% (down 0.79pp from Phase 59 due to mimalloc variation)
  3. HAKMEM throughput remains stable at ~58.5M ops/s

Recommendation:

  • Adopt Speed-first (MIXED_TINYV3_C7_SAFE) as canonical default
  • Update PERFORMANCE_TARGETS_SCORECARD.md with new baseline
  • Use 48.34% as reference for future comparisons

Next Steps

  • Phase 61: C7 ULTRA header-light optimization
  • Target: +1.0% improvement from header write elimination