Phase POLICY-FAST-PATH-V2 complete + MID-V35-HOTPATH-OPT-1 design

## Phase POLICY-FAST-PATH-V2 (FROZEN)
- Implementation complete: free_policy_fast_v2_box.h + malloc_tiny_fast.h integration
- A/B Results:
  - Mixed (ws=400): -1.6% regression  (branch cost > skip benefit)
  - C6-heavy (ws=200): +5.4% improvement 
- Decision: Default OFF, FROZEN (ws<300 / C6-heavy research only)
- Learning: Large WS causes branch misprediction to dominate

## Phase 3-GRADUATE + ENV probe fix
- 64-probe retry for getenv() stability during bench_profile putenv()
- C6 ULTRA intrusive freelist: FROZEN (research box)

## Phase MID-V35-HOTPATH-OPT-1-DESIGN
- Design doc for next optimization target
- Target: MID v3.5 alloc/free hot path (C5-C6)
- Boxes: Stats Gate, TLS Layout, Boundary Check elimination
- Expected: +3-9% on Mixed mainline

Files:
- core/box/free_policy_fast_v2_box.h (new)
- core/box/free_path_stats_box.h/c (policy_fast_v2_skip counter)
- core/front/malloc_tiny_fast.h (fast-path integration)
- docs/analysis/MID_V35_HOTPATH_OPT_1_DESIGN.md (new)
- docs/analysis/PHASE_3_GRADUATE_*.md (new)
- CURRENT_TASK.md (phase status update)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Moe Charm (CI)
2025-12-12 18:40:08 +09:00
parent 0c8583f91e
commit e95e61f0ff
13 changed files with 1099 additions and 53 deletions

View File

@ -48,6 +48,18 @@ HAKMEM_TINY_HEAP_STATS=1
HAKMEM_TINY_HEAP_STATS_DUMP=1
HAKMEM_SMALL_HEAP_V3_STATS=1
```
- **Phase POLICY-FAST-PATH-V2** (FROZEN - research only):
```sh
HAKMEM_TINY_FREE_POLICY_FAST_V2=1 # Fast-path free optimization
```
- **Status**: Default OFF, FROZEN (merge complete)
- **Actual Results** (Phase POLICY-FAST-PATH-V2 A/B):
- Mixed (ws=400): **-1.6%** regression ❌ (added branch cost > skip benefit)
- C6-heavy (ws=200): **+5.4%** improvement ✅
- **Finding**: Large working set (ws>300) causes branch misprediction cost to dominate
- **Recommendation**: Use only for C6-heavy or ws<300 research benchmarks
- **NOT recommended for**: MIXED_TINYV3_C7_SAFE mainline (keep OFF)
- **Requirement**: Only effective when v7 Learner is disabled
- v2 系は触らないC7_SAFE では Pool v2 / Tiny v2 は常時 OFF)。
- FREE_POLICY/THP を触る実験例現在の HEAD では必須ではなく組み合わせによっては微マイナスになる場合もある:
```sh
@ -332,6 +344,71 @@ HAKMEM_BENCH_MIN_SIZE=200 HAKMEM_BENCH_MAX_SIZE=500 \
---
## Research Profile 4: C6_ULTRA_INTRUSIVE_EXPERIMENT_V12C6 ULTRA intrusive LIFO vs array magazine, Phase TLS-UNIFY-3
**FROZEN - Research Only**: Phase TLS-UNIFY-3 validation complete.
Findings:
- C6-heavy (257-512B): +3.8% improvement
- Mixed (16-1024B): -12~14% regression (policy overhead + TLS contention)
- Recommendation: Use only for C6-heavy workloads or research/debugging
- Default: OFF (MID v3/v3.5 faster for Mixed)
### 目的
- **Phase TLS-UNIFY-3 validation**: C6 ULTRA intrusive LIFO freelist array magazine の比較
- C6 ULTRA path routing TLS 内の LIFO 表現だけを A/B
- ULTRA routing MID v3/v3.5 override するため研究コンテキストのみで使用
### 性能実績
- C6-heavy (257-512B, 1M iter, ws=200, 5-run mean):
- Baseline (C6=MID v3.5): 55.3M ops/s
- ULTRA+array (intrusive OFF): 57.4M ops/s (+3.79% vs Baseline)
- ULTRA+intrusive (intrusive ON): 54.5M ops/s (-1.44% vs Baseline, PASS)
- IFL stats: push=265,890 / pop=265,815 / fallback=0perfect LIFO behavior
- Mixed 161024B標準本線:
- **ULTRA+intrusive は約 -14% の回帰**を確認
- Root cause:
- 8 クラス(C0C7) 1TLS 内で競合しC4/C5/C6/C7 ULTRA TLS(約2KB)が奪い合い状態になる
- ULTRA miss が増えLegacy fallback が約 24% に達する
- **結論**: Mixed 本線では C6 ULTRA を使わない`MIXED_TINYV3_C7_SAFE` の設計どおり)。
### ENVULTRA intrusive opt-in
```sh
HAKMEM_BENCH_MIN_SIZE=257
HAKMEM_BENCH_MAX_SIZE=512
HAKMEM_TINY_C6_ULTRA_FREE_ENABLED=1 # ★ C6 を ULTRA path に routing
HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL=1 # ★ intrusive LIFO freelist 有効化
HAKMEM_FREE_PATH_STATS=1 # stats 取得用
```
### テストコマンド
```sh
# Baseline: C6=MID v3.5 (ULTRA routing なし)
HAKMEM_BENCH_MIN_SIZE=257 HAKMEM_BENCH_MAX_SIZE=512 \
./bench_random_mixed_hakmem 1000000 200 1
# ULTRA+array: array magazine (intrusive OFF)
HAKMEM_BENCH_MIN_SIZE=257 HAKMEM_BENCH_MAX_SIZE=512 \
HAKMEM_TINY_C6_ULTRA_FREE_ENABLED=1 HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL=0 \
HAKMEM_FREE_PATH_STATS=1 ./bench_random_mixed_hakmem 1000000 200 1
# ULTRA+intrusive: intrusive LIFO freelist
HAKMEM_BENCH_MIN_SIZE=257 HAKMEM_BENCH_MAX_SIZE=512 \
HAKMEM_TINY_C6_ULTRA_FREE_ENABLED=1 HAKMEM_TINY_C6_ULTRA_INTRUSIVE_FL=1 \
HAKMEM_FREE_PATH_STATS=1 ./bench_random_mixed_hakmem 1000000 200 1
```
### 期待値
- ULTRA+intrusive >= Baselineor small regression < 5%
- c6_ifl_fallback 0intrusive LIFO が正常動作
- c6_ultra_free/alloc > 0ULTRA path が動作)
### 注意
- **WARNING**: ULTRA routing overrides MID v3/v3.5 - use only in research context.
- **Usage**: C6-heavy 専用の研究箱として使用Mixed 本線では非推奨 / 回帰あり)。
- 本線には載せない、研究箱扱い。
---
### 共通注意
- プリセットから外れて単発の ENV を積み足すと再現が難しくなるので、まずは上記いずれかからスタートし、変更点を必ずメモしてください。
- v2 系Pool v2 / Tiny v2はベンチごとに opt-in。不要なら常に 0。