Phase POLICY-FAST-PATH-V2 complete + MID-V35-HOTPATH-OPT-1 design
## Phase POLICY-FAST-PATH-V2 (FROZEN) - Implementation complete: free_policy_fast_v2_box.h + malloc_tiny_fast.h integration - A/B Results: - Mixed (ws=400): -1.6% regression ❌ (branch cost > skip benefit) - C6-heavy (ws=200): +5.4% improvement ✅ - Decision: Default OFF, FROZEN (ws<300 / C6-heavy research only) - Learning: Large WS causes branch misprediction to dominate ## Phase 3-GRADUATE + ENV probe fix - 64-probe retry for getenv() stability during bench_profile putenv() - C6 ULTRA intrusive freelist: FROZEN (research box) ## Phase MID-V35-HOTPATH-OPT-1-DESIGN - Design doc for next optimization target - Target: MID v3.5 alloc/free hot path (C5-C6) - Boxes: Stats Gate, TLS Layout, Boundary Check elimination - Expected: +3-9% on Mixed mainline Files: - core/box/free_policy_fast_v2_box.h (new) - core/box/free_path_stats_box.h/c (policy_fast_v2_skip counter) - core/front/malloc_tiny_fast.h (fast-path integration) - docs/analysis/MID_V35_HOTPATH_OPT_1_DESIGN.md (new) - docs/analysis/PHASE_3_GRADUATE_*.md (new) - CURRENT_TASK.md (phase status update) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@ -48,6 +48,18 @@ HAKMEM_TINY_HEAP_STATS=1
|
||||
HAKMEM_TINY_HEAP_STATS_DUMP=1
|
||||
HAKMEM_SMALL_HEAP_V3_STATS=1
|
||||
```
|
||||
- **Phase POLICY-FAST-PATH-V2** (FROZEN - research only):
|
||||
```sh
|
||||
HAKMEM_TINY_FREE_POLICY_FAST_V2=1 # Fast-path free optimization
|
||||
```
|
||||
- **Status**: Default OFF, FROZEN (merge complete)
|
||||
- **Actual Results** (Phase POLICY-FAST-PATH-V2 A/B):
|
||||
- Mixed (ws=400): **-1.6%** regression ❌ (added branch cost > skip benefit)
|
||||
- C6-heavy (ws=200): **+5.4%** improvement ✅
|
||||
- **Finding**: Large working set (ws>300) causes branch misprediction cost to dominate
|
||||
- **Recommendation**: Use only for C6-heavy or ws<300 research benchmarks
|
||||
- **NOT recommended for**: MIXED_TINYV3_C7_SAFE mainline (keep OFF)
|
||||
- **Requirement**: Only effective when v7 Learner is disabled
|
||||
- v2 系は触らない(C7_SAFE では Pool v2 / Tiny v2 は常時 OFF)。
|
||||
- FREE_POLICY/THP を触る実験例(現在の HEAD では必須ではなく、組み合わせによっては微マイナスになる場合もある):
|
||||
```sh
|
||||
@ -332,6 +344,71 @@ HAKMEM_BENCH_MIN_SIZE=200 HAKMEM_BENCH_MAX_SIZE=500 \
|
||||
|
||||
---
|
||||
|
||||
## Research Profile 4: C6_ULTRA_INTRUSIVE_EXPERIMENT_V12(C6 ULTRA intrusive LIFO vs array magazine, Phase TLS-UNIFY-3)
|
||||
|
||||
**FROZEN - Research Only**: Phase TLS-UNIFY-3 validation complete.
|
||||
Findings:
|
||||
- C6-heavy (257-512B): +3.8% improvement ✅
|
||||
- Mixed (16-1024B): -12~14% regression ❌ (policy overhead + TLS contention)
|
||||
- Recommendation: Use only for C6-heavy workloads or research/debugging
|
||||
- Default: OFF (MID v3/v3.5 faster for Mixed)
|
||||
|
||||
### 目的
|
||||
- **Phase TLS-UNIFY-3 validation**: C6 ULTRA intrusive LIFO freelist と array magazine の比較。
|
||||
- C6 を ULTRA path に routing し、TLS 内の LIFO 表現だけを A/B。
|
||||
- ULTRA routing は MID v3/v3.5 を override するため、研究コンテキストのみで使用。
|
||||
|
||||
### 性能実績
|
||||
- C6-heavy (257-512B, 1M iter, ws=200, 5-run mean):
|
||||
- Baseline (C6=MID v3.5): 55.3M ops/s
|
||||
- ULTRA+array (intrusive OFF): 57.4M ops/s (+3.79% vs Baseline)
|
||||
- ULTRA+intrusive (intrusive ON): 54.5M ops/s (-1.44% vs Baseline, ✅ PASS)
|
||||
- IFL stats: push=265,890 / pop=265,815 / fallback=0(perfect LIFO behavior)
|
||||
- Mixed 16–1024B(標準本線):
|
||||
- **ULTRA+intrusive は約 -14% の回帰**を確認。
|
||||
- Root cause:
|
||||
- 8 クラス(C0–C7)が 1TLS 内で競合し、C4/C5/C6/C7 の ULTRA TLS(約2KB)が奪い合い状態になる。
|
||||
- ULTRA miss が増え、Legacy fallback が約 24% に達する。
|
||||
- **結論**: Mixed 本線では C6 ULTRA を使わない(`MIXED_TINYV3_C7_SAFE` の設計どおり)。
|
||||
|
||||
### ENV(ULTRA intrusive opt-in)
|
||||
```sh
|
||||
HAKMEM_BENCH_MIN_SIZE=257
|
||||
HAKMEM_BENCH_MAX_SIZE=512
|
||||
HAKMEM_TINY_C6_ULTRA_FREE_ENABLED=1 # ★ C6 を ULTRA path に routing
|
||||
HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL=1 # ★ intrusive LIFO freelist 有効化
|
||||
HAKMEM_FREE_PATH_STATS=1 # stats 取得用
|
||||
```
|
||||
|
||||
### テストコマンド
|
||||
```sh
|
||||
# Baseline: C6=MID v3.5 (ULTRA routing なし)
|
||||
HAKMEM_BENCH_MIN_SIZE=257 HAKMEM_BENCH_MAX_SIZE=512 \
|
||||
./bench_random_mixed_hakmem 1000000 200 1
|
||||
|
||||
# ULTRA+array: array magazine (intrusive OFF)
|
||||
HAKMEM_BENCH_MIN_SIZE=257 HAKMEM_BENCH_MAX_SIZE=512 \
|
||||
HAKMEM_TINY_C6_ULTRA_FREE_ENABLED=1 HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL=0 \
|
||||
HAKMEM_FREE_PATH_STATS=1 ./bench_random_mixed_hakmem 1000000 200 1
|
||||
|
||||
# ULTRA+intrusive: intrusive LIFO freelist
|
||||
HAKMEM_BENCH_MIN_SIZE=257 HAKMEM_BENCH_MAX_SIZE=512 \
|
||||
HAKMEM_TINY_C6_ULTRA_FREE_ENABLED=1 HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL=1 \
|
||||
HAKMEM_FREE_PATH_STATS=1 ./bench_random_mixed_hakmem 1000000 200 1
|
||||
```
|
||||
|
||||
### 期待値
|
||||
- ULTRA+intrusive >= Baseline(or small regression < 5%)
|
||||
- c6_ifl_fallback ≈ 0(intrusive LIFO が正常動作)
|
||||
- c6_ultra_free/alloc > 0(ULTRA path が動作)
|
||||
|
||||
### 注意
|
||||
- **WARNING**: ULTRA routing overrides MID v3/v3.5 - use only in research context.
|
||||
- **Usage**: C6-heavy 専用の研究箱として使用(Mixed 本線では非推奨 / 回帰あり)。
|
||||
- 本線には載せない、研究箱扱い。
|
||||
|
||||
---
|
||||
|
||||
### 共通注意
|
||||
- プリセットから外れて単発の ENV を積み足すと再現が難しくなるので、まずは上記いずれかからスタートし、変更点を必ずメモしてください。
|
||||
- v2 系(Pool v2 / Tiny v2)はベンチごとに opt-in。不要なら常に 0。
|
||||
|
||||
Reference in New Issue
Block a user