Files
hakmem/docs/analysis/PHASE53_RSS_TAX_TRIAGE_INSTRUCTIONS.md
Moe Charm (CI) 7adbcdfcb6 Phase 54-60: Memory-Lean mode, Balanced mode stabilization, M1 (50%) achievement
## Summary

Completed Phase 54-60 optimization work:

**Phase 54-56: Memory-Lean mode (LEAN+OFF prewarm suppression)**
- Implemented ss_mem_lean_env_box.h with ENV gates
- Balanced mode (LEAN+OFF) promoted as production default
- Result: +1.2% throughput, better stability, zero syscall overhead
- Added to bench_profile.h: MIXED_TINYV3_C7_BALANCED preset

**Phase 57: 60-min soak finalization**
- Balanced mode: 60-min soak, RSS drift 0%, CV 5.38%
- Speed-first mode: 60-min soak, RSS drift 0%, CV 1.58%
- Syscall budget: 1.25e-7/op (800× under target)
- Status: PRODUCTION-READY

**Phase 59: 50% recovery baseline rebase**
- hakmem FAST (Balanced): 59.184M ops/s, CV 1.31%
- mimalloc: 120.466M ops/s, CV 3.50%
- Ratio: 49.13% (M1 ACHIEVED within statistical noise)
- Superior stability: 2.68× better CV than mimalloc

**Phase 60: Alloc pass-down SSOT (NO-GO)**
- Implemented alloc_passdown_ssot_env_box.h
- Modified malloc_tiny_fast.h for SSOT pattern
- Result: -0.46% (NO-GO)
- Key lesson: SSOT not applicable where early-exit already optimized

## Key Metrics

- Performance: 49.13% of mimalloc (M1 effectively achieved)
- Stability: CV 1.31% (superior to mimalloc 3.50%)
- Syscall budget: 1.25e-7/op (excellent)
- RSS: 33MB stable, 0% drift over 60 minutes

## Files Added/Modified

New boxes:
- core/box/ss_mem_lean_env_box.h
- core/box/ss_release_policy_box.{h,c}
- core/box/alloc_passdown_ssot_env_box.h

Scripts:
- scripts/soak_mixed_single_process.sh
- scripts/analyze_epoch_tail_csv.py
- scripts/soak_mixed_rss.sh
- scripts/calculate_percentiles.py
- scripts/analyze_soak.py

Documentation: Phase 40-60 analysis documents

## Design Decisions

1. Profile separation (core/bench_profile.h):
   - MIXED_TINYV3_C7_SAFE: Speed-first (no LEAN)
   - MIXED_TINYV3_C7_BALANCED: Balanced mode (LEAN+OFF)

2. Box Theory compliance:
   - All ENV gates reversible (HAKMEM_SS_MEM_LEAN, HAKMEM_ALLOC_PASSDOWN_SSOT)
   - Single conversion points maintained
   - No physical deletions (compile-out only)

3. Lessons learned:
   - SSOT effective only where redundancy exists (Phase 60 showed limits)
   - Branch prediction extremely effective (~0 cycles for well-predicted branches)
   - Early-exit pattern valuable even when seemingly redundant

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-17 06:24:01 +09:00

74 lines
2.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 53: RSS Tax Triage33MB vs 2MB の原因切り分け)
目的: Phase 51 で確定した「hakmem の peak RSS ≈33MB」が、(A) ベンチの warmup/prefault 由来なのか、(B) allocator の設計superslab/metadata 常駐)由来なのかを切り分ける。
ゴール:
- “減らせる RSS” と “速度のために保持する RSS” を分離し、方針profileを決める
---
## Step 0: ベース確認Phase 51 と同条件)
```bash
make bench_random_mixed_hakmem_minimal
BENCH_BIN=./bench_random_mixed_hakmem_minimal \
DURATION_SEC=300 EPOCH_SEC=5 WS=400 \
scripts/soak_mixed_single_process.sh > soak_single_hakmem_fast_5m.csv
```
---
## Step 1: prefault の影響(最重要)
bench は `HAKMEM_BENCH_PREFAULT` が未設定だと “cycles/10” を warmup に使う。
single-process soak は `cycles` が大きいため、prefault が RSS を押し上げる可能性がある。
### 1-A) prefault OFF
```bash
HAKMEM_BENCH_PREFAULT=0 \
BENCH_BIN=./bench_random_mixed_hakmem_minimal \
DURATION_SEC=300 EPOCH_SEC=5 WS=400 \
scripts/soak_mixed_single_process.sh > soak_single_hakmem_fast_5m_prefault0.csv
```
### 1-B) prefault ON小さく固定
```bash
HAKMEM_BENCH_PREFAULT=20000000 \
BENCH_BIN=./bench_random_mixed_hakmem_minimal \
DURATION_SEC=300 EPOCH_SEC=5 WS=400 \
scripts/soak_mixed_single_process.sh > soak_single_hakmem_fast_5m_prefault20m.csv
```
判定:
- RSS が大きく下がるなら “ベンチ由来の tax” が主因
- RSS がほぼ変わらないなら “allocator 設計由来metadata/segment 常駐)” が主因
---
## Step 2: 内部メモリ統計OBSERVE build
目的: “どの箱がメモリを握っているか” を観測する(速度は見ない)。
```bash
make bench_random_mixed_hakmem_observe
HAKMEM_TINY_MEM_DUMP=1 HAKMEM_SS_STATS_DUMP=1 HAKMEM_WARM_POOL_STATS=1 \
./bench_random_mixed_hakmem_observe 20000000 400 1
```
観測:
- Tiny mem stats / superslab stats / warm pool stats の内訳
---
## Step 3: 方針profile
分岐:
- **Speed-first**: RSS は許容syscall を増やさない、長時間安定性優先)
- **Memory-lean**: RSS を落とす代わりに throughput/ syscalls を受け入れる(将来 Phase 54 で profile 化)
SSOT:
- `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md` に “peak RSS taxMixed/WS=400” と方針を明記