Files
hakmem/docs/analysis/PHASE52_TAIL_LATENCY_PROXY_INSTRUCTIONS.md
Moe Charm (CI) 7adbcdfcb6 Phase 54-60: Memory-Lean mode, Balanced mode stabilization, M1 (50%) achievement
## Summary

Completed Phase 54-60 optimization work:

**Phase 54-56: Memory-Lean mode (LEAN+OFF prewarm suppression)**
- Implemented ss_mem_lean_env_box.h with ENV gates
- Balanced mode (LEAN+OFF) promoted as production default
- Result: +1.2% throughput, better stability, zero syscall overhead
- Added to bench_profile.h: MIXED_TINYV3_C7_BALANCED preset

**Phase 57: 60-min soak finalization**
- Balanced mode: 60-min soak, RSS drift 0%, CV 5.38%
- Speed-first mode: 60-min soak, RSS drift 0%, CV 1.58%
- Syscall budget: 1.25e-7/op (800× under target)
- Status: PRODUCTION-READY

**Phase 59: 50% recovery baseline rebase**
- hakmem FAST (Balanced): 59.184M ops/s, CV 1.31%
- mimalloc: 120.466M ops/s, CV 3.50%
- Ratio: 49.13% (M1 ACHIEVED within statistical noise)
- Superior stability: 2.68× better CV than mimalloc

**Phase 60: Alloc pass-down SSOT (NO-GO)**
- Implemented alloc_passdown_ssot_env_box.h
- Modified malloc_tiny_fast.h for SSOT pattern
- Result: -0.46% (NO-GO)
- Key lesson: SSOT not applicable where early-exit already optimized

## Key Metrics

- Performance: 49.13% of mimalloc (M1 effectively achieved)
- Stability: CV 1.31% (superior to mimalloc 3.50%)
- Syscall budget: 1.25e-7/op (excellent)
- RSS: 33MB stable, 0% drift over 60 minutes

## Files Added/Modified

New boxes:
- core/box/ss_mem_lean_env_box.h
- core/box/ss_release_policy_box.{h,c}
- core/box/alloc_passdown_ssot_env_box.h

Scripts:
- scripts/soak_mixed_single_process.sh
- scripts/analyze_epoch_tail_csv.py
- scripts/soak_mixed_rss.sh
- scripts/calculate_percentiles.py
- scripts/analyze_soak.py

Documentation: Phase 40-60 analysis documents

## Design Decisions

1. Profile separation (core/bench_profile.h):
   - MIXED_TINYV3_C7_SAFE: Speed-first (no LEAN)
   - MIXED_TINYV3_C7_BALANCED: Balanced mode (LEAN+OFF)

2. Box Theory compliance:
   - All ENV gates reversible (HAKMEM_SS_MEM_LEAN, HAKMEM_ALLOC_PASSDOWN_SSOT)
   - Single conversion points maintained
   - No physical deletions (compile-out only)

3. Lessons learned:
   - SSOT effective only where redundancy exists (Phase 60 showed limits)
   - Branch prediction extremely effective (~0 cycles for well-predicted branches)
   - Early-exit pattern valuable even when seemingly redundant

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-17 06:24:01 +09:00

80 lines
2.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 52: Tail Latency Proxyepoch throughput quantiles
目的: 「alloc/free の tail」を **コード変更なし**で近似し、SSOT 化する。
方針: 単一プロセス soak の **短い epoch** を大量に取り、epoch throughput の分布p50/p90/p99/p999で tail を proxy する。
理由:
- 1-op latency histogram を入れると測定オーバーヘッドが大きい
- 既に epoch 出力(`[EPOCH] ... Throughput ... rss_kb=...`)があるので、低コストで tail proxy が取れる
---
## Step 0: Build
```bash
make bench_random_mixed_hakmem_minimal
make bench_random_mixed_system
make bench_random_mixed_mi
```
---
## Step 1: Single-process epoch soak5分、epoch=1秒
hakmem FAST:
```bash
BENCH_BIN=./bench_random_mixed_hakmem_minimal \
DURATION_SEC=300 EPOCH_SEC=1 WS=400 \
scripts/soak_mixed_single_process.sh > tail_epoch_hakmem_fast_5m.csv
```
mimalloc / system:
```bash
BENCH_BIN=./bench_random_mixed_mi \
DURATION_SEC=300 EPOCH_SEC=1 WS=400 \
scripts/soak_mixed_single_process.sh > tail_epoch_mimalloc_5m.csv
BENCH_BIN=./bench_random_mixed_system \
DURATION_SEC=300 EPOCH_SEC=1 WS=400 \
scripts/soak_mixed_single_process.sh > tail_epoch_system_5m.csv
```
注意:
- `EPOCH_SEC=1` で ~300 サンプルp99 近傍が見える)
- さらに tail を見たい場合は `DURATION_SEC=1800`30分で ~1800 サンプル
---
## Step 2: 集計p50/p90/p99/p999
CSV から分位を計算する。
重要:
- **throughput の tail は “低い側”**p1/p0.1で見るp99 は “速い側”)
- **latency percentiles は per-epoch latency を作ってから計算**する(`p99(latency) ≠ 1/p99(throughput)`
推奨(正しい計算):
```bash
python3 scripts/analyze_epoch_tail_csv.py tail_epoch_hakmem_fast_5m.csv
```
推奨:
- p50/p90/p99 は 5分で十分
- p999 は 30分以上推奨サンプル数が足りない
(分析スクリプトがある場合はそれを使用。なければ簡易 awk/python を追加する Phase 52-1 で行う)
---
## Step 3: SSOT 更新
更新先:
- `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md`
追記:
- tail proxyp99 epoch throughput / p99 latency proxy
- どの条件FAST/WS/epoch_sec/durationで測ったか