Files
hakmem/docs/analysis/BENCH_REPRODUCIBILITY_SSOT.md
Moe Charm (CI) e4c5f05355 Phase 86: Free Path Legacy Mask (NO-GO, +0.25%)
## Summary

Implemented Phase 86 "mask-only commit" optimization for free path:
- Bitset mask (0x7f for C0-C6) to identify LEGACY classes
- Direct call to tiny_legacy_fallback_free_base_with_env()
- No indirect function pointers (avoids Phase 85's -0.86% regression)
- Fail-fast on LARSON_FIX=1 (cross-thread validation incompatibility)

## Results (10-run SSOT)

**NO-GO**: +0.25% improvement (threshold: +1.0%)
- Control:    51,750,467 ops/s (CV: 2.26%)
- Treatment:  51,881,055 ops/s (CV: 2.32%)
- Delta:      +0.25% (mean), -0.15% (median)

## Root Cause

Competing optimizations plateau:
1. Phase 9/10 MONO LEGACY (+1.89%) already capture most free path benefit
2. Remaining margin insufficient to overcome:
   - Two branch checks (mask_enabled + has_class)
   - I-cache layout tax in hot path
   - Direct function call overhead

## Phase 85 vs Phase 86

| Metric | Phase 85 | Phase 86 |
|--------|----------|----------|
| Approach | Indirect calls + table | Bitset mask + direct call |
| Result | -0.86% | +0.25% |
| Verdict | NO-GO (regression) | NO-GO (insufficient) |

Phase 86 correctly avoided indirect call penalties but revealed architectural
limit: can't escape Phase 9/10 overlay without restructuring.

## Recommendation

Free path optimization layer has reached practical ceiling:
- Phase 9/10 +1.89% + Phase 6/19/FASTLANE +16-27% ≈ 18-29% total
- Further attempts on ceremony elimination face same constraints
- Recommend focus on different optimization layers (malloc, etc.)

## Files Changed

### New
- core/box/free_path_legacy_mask_box.h (API + globals)
- core/box/free_path_legacy_mask_box.c (refresh logic)

### Modified
- core/bench_profile.h (added refresh call)
- core/front/malloc_tiny_fast.h (added Phase 86 fast path check)
- Makefile (added object files)
- CURRENT_TASK.md (documented result)

All changes conditional on HAKMEM_FREE_PATH_LEGACY_MASK=1 (default OFF).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2025-12-18 22:05:34 +09:00

2.1 KiB
Raw Blame History

Bench Reproducibility SSOTころころ防止の最低限

目的: 「数%を詰める開発」で一番きつい ベンチが再現しない問題を潰す。

1) まず結論(よくある原因)

同じマシンでも、以下が変わると 515% は普通に動く。

  • CPU power/thermalgovernor / EPP / turbo
  • HAKMEM_PROFILE 未指定route が変わる)
  • export 漏れ(過去の ENV が残る)
  • 別バイナリ比較layout tax: text 配置が変わる)

2) SSOT最適化判断の正

  • Runner: scripts/run_mixed_10_cleanenv.sh
  • 必須:
    • HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE を明示
    • RUNS=10(ノイズを平均化)
    • WS=400SSOT
  • 任意(切り分け用):
    • HAKMEM_BENCH_ENV_LOG=1CPU governor/EPP/freq をログ)

3) referenceallocator間比較の正

allocator比較は layout tax が混ざるため reference。 ただし “公平さ” を上げるなら同一バイナリで測る:

  • Same-binary runner: scripts/run_allocator_preload_matrix.sh
    • bench_random_mixed_system を固定して LD_PRELOAD を差し替える

4) “ころころ”を止める運用(最低限の儀式)

  1. SSOT実行は必ず cleanenv:
    • scripts/run_mixed_10_cleanenv.sh
  2. 毎回、環境ログを残す:
    • HAKMEM_BENCH_ENV_LOG=1
  3. 結果をファイル化(後から追える形):
    • scripts/bench_ssot_capture.sh を使うgit sha / env / bench出力をまとめて保存

5) 重要メモAMD pstate epp

amd-pstate-epp 環境で

  • governor=powersave
  • energy_perf_preference=power のままだと、ベンチが“遅い側”に寄ることがある。

まずは HAKMEM_BENCH_ENV_LOG=1 の出力が 同じ条件同士で比較すること。

6) 外部レビュー(貼り付けパケット)

「コードを圧縮して貼る」用途は、毎回の手作業を減らすためにパケット生成を使う:

  • 生成スクリプト: scripts/make_chatgpt_pro_packet_free_path.sh
  • 生成物(スナップショット): docs/analysis/FREE_PATH_REVIEW_PACKET_CHATGPT.md