Working state before pushing to cyu remote
This commit is contained in:
234
CURRENT_TASK.md
234
CURRENT_TASK.md
@ -1,5 +1,193 @@
|
|||||||
# CURRENT_TASK(Rolling, SSOT)
|
# CURRENT_TASK(Rolling, SSOT)
|
||||||
|
|
||||||
|
## SSOT(今の正)
|
||||||
|
|
||||||
|
- **性能SSOT**: `scripts/run_mixed_10_cleanenv.sh`(WS=400, RUNS=10, サイズ16..1040強制、*_ONLY強制OFF)
|
||||||
|
- **経路確認**: `scripts/run_mixed_observe_ssot.sh`(OBSERVE専用、throughput比較には使わない)
|
||||||
|
- **buildモード**: `docs/analysis/SSOT_BUILD_MODES.md`
|
||||||
|
- **外部比較(短時間)**: `docs/analysis/PHASE92_TCMALLOC_GAP_TRIAGE_SSOT.md`(LD_PRELOAD同一バイナリ + hakmem_force_libc 切り分け)
|
||||||
|
|
||||||
|
## Phase 87-88(終了: NO-GO)
|
||||||
|
|
||||||
|
**Status**: ✅ **OBSERVE verified** + ❌ **Phase 88 NO-GO**
|
||||||
|
|
||||||
|
### Phase 87: Inline Slots Verification
|
||||||
|
|
||||||
|
**Initial Finding (Wrong)**: Standard binary showed PUSH TOTAL/POP TOTAL = 0
|
||||||
|
- **Root Cause**: ENV ドリフト(`HAKMEM_BENCH_MIN_SIZE/MAX_SIZE` 漏れ)
|
||||||
|
- 修正: `scripts/run_mixed_10_cleanenv.sh` でサイズ範囲を強制固定(MIN=16, MAX=1040)
|
||||||
|
- `HAKMEM_BENCH_C5_ONLY=0`, `HAKMEM_BENCH_C6_ONLY=0`, `HAKMEM_BENCH_C7_ONLY=0` 強制
|
||||||
|
|
||||||
|
**Corrected Finding (OBSERVE binary)** - 20M ops Mixed SSOT WS=400:
|
||||||
|
```
|
||||||
|
PUSH TOTAL: C4=687,564 C5=1,373,605 C6=2,750,862 TOTAL=4,812,031 ✓
|
||||||
|
POP TOTAL: C4=687,564 C5=1,373,605 C6=2,750,862 TOTAL=4,812,031 ✓
|
||||||
|
PUSH FULL: 0 (0.00%)
|
||||||
|
POP EMPTY: 168 (0.003%)
|
||||||
|
|
||||||
|
JUDGMENT: ✓ [C] LEGACY used AND C4/C5/C6 INLINE SLOTS ACTIVE → Ready for Phase 88/89
|
||||||
|
```
|
||||||
|
|
||||||
|
### Phase 88: Batch Drain Optimization
|
||||||
|
|
||||||
|
**Overflow Analysis**:
|
||||||
|
- POP EMPTY rate: 168 / 4,812,031 = **0.003%** ← 極小
|
||||||
|
- PUSH FULL rate: 0 / 4,812,031 = **0%** ← 起きていない
|
||||||
|
- **Decision**: バッチ化しても速さは動かない(overflow がほぼ起きていない)
|
||||||
|
|
||||||
|
**Phase 88 Decision**: **NO-GO(凍結)**
|
||||||
|
- Rationale: 0.003% overflow 率では layout tax リスク > 期待値
|
||||||
|
- Infrastructure: 観測用 telemetry は残す(将来の WS/容量 変更時に再検証可能)
|
||||||
|
|
||||||
|
**Artifacts Created**:
|
||||||
|
- Telemetry box: `core/box/tiny_inline_slots_overflow_stats_box.h/c`
|
||||||
|
- Phase 87 results: `docs/analysis/PHASE87_OBSERVATION_RESULTS.md`
|
||||||
|
- SSOT 強化: `scripts/run_mixed_10_cleanenv.sh`, `scripts/run_mixed_observe_ssot.sh`
|
||||||
|
- ENV ドリフト防止ドキュメント: `docs/analysis/BENCH_REPRODUCIBILITY_SSOT.md`
|
||||||
|
|
||||||
|
**Key Learning**:
|
||||||
|
- "踏んでるか確定"には **OBSERVE バイナリ + total counters** が必須
|
||||||
|
- 観測と性能測定は分離(telemetry overhead を避ける)
|
||||||
|
- ENV ドリフト(MIN/MAX サイズ, CLASS_ONLY) = 経路を変える主要因
|
||||||
|
**Follow-up Fix (SSOT hardening)**:
|
||||||
|
- `scripts/run_mixed_10_cleanenv.sh` now forces `HAKMEM_BENCH_MIN_SIZE=16` / `HAKMEM_BENCH_MAX_SIZE=1040` and disables `HAKMEM_BENCH_C{5,6,7}_ONLY` to prevent path drift.
|
||||||
|
- New pre-flight helper: `scripts/run_mixed_observe_ssot.sh` (Route Banner + OBSERVE, single run).
|
||||||
|
- Overflow stats compile gating fixed (see above).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 89(完了: Bottleneck Analysis & Optimization Roadmap)
|
||||||
|
|
||||||
|
**Status**: ✅ **SSOT Measurement Complete** + **3 Optimization Candidates Identified**
|
||||||
|
|
||||||
|
### 4-Step SSOT Procedure Completion
|
||||||
|
|
||||||
|
**Step 1: OBSERVE Binary Preflight**
|
||||||
|
- Binary: `bench_random_mixed_hakmem_observe` (with telemetry enabled)
|
||||||
|
- Inline slots verification: ✓ PUSH TOTAL = 4.81M, POP EMPTY = 0.003% (confirmed active & healthy)
|
||||||
|
- Throughput (with telemetry): 51.52M ops/s
|
||||||
|
|
||||||
|
**Step 2: Standard 10-run Baseline**
|
||||||
|
- Binary: `bench_random_mixed_hakmem` (clean, no telemetry)
|
||||||
|
- 10-run SSOT results: **51.36M ops/s** (CV: 0.7%, very stable)
|
||||||
|
- Range: 50.74M - 51.73M
|
||||||
|
- **Decision**: This is baseline for bottleneck analysis
|
||||||
|
|
||||||
|
**Step 3: FAST PGO 10-run Comparison**
|
||||||
|
- Binary: `bench_random_mixed_hakmem_minimal_pgo` (PGO optimized)
|
||||||
|
- 10-run SSOT results: **54.16M ops/s** (CV: 1.5%, acceptable)
|
||||||
|
- Range: 52.89M - 55.13M
|
||||||
|
- **Performance Gap**: 54.16M - 51.36M = **2.80M ops/s (+5.45%)**
|
||||||
|
- This represents the optimization ceiling with current PGO profile
|
||||||
|
|
||||||
|
**Step 4: Results Captured**
|
||||||
|
- Git SHA: e4c5f0535 (master branch)
|
||||||
|
- Timestamp: 2025-12-18 23:06:01
|
||||||
|
- System: AMD Ryzen 5825U, 16 cores, 6.8.0-87-generic kernel
|
||||||
|
- Files: `docs/analysis/PHASE89_SSOT_MEASUREMENT.md`
|
||||||
|
|
||||||
|
### Perf Analysis & Top Bottleneck Identification
|
||||||
|
|
||||||
|
**Profile Run**: 40M operations (0.78s), 833 perf samples
|
||||||
|
|
||||||
|
**Top Functions by CPU Time**:
|
||||||
|
1. **free** - 27.40% (hottest)
|
||||||
|
2. main - 26.30% (benchmark loop, not optimizable)
|
||||||
|
3. **malloc** - 20.36% (hottest)
|
||||||
|
4. malloc.cold - 10.65% (cold path, avoid optimizing)
|
||||||
|
5. free.cold - 5.59% (cold path, avoid optimizing)
|
||||||
|
6. **tiny_region_id_write_header** - 2.98% (hot, inlining candidate)
|
||||||
|
|
||||||
|
**malloc + free combined = 47.76% of CPU time** (already Phase 9/10/78-1/80-1 optimized)
|
||||||
|
|
||||||
|
### Top 3 Optimization Candidates (Ranked by Priority)
|
||||||
|
|
||||||
|
| Candidate | Priority | Recommendation | Expected Gain | Risk | Effort |
|
||||||
|
|-----------|----------|-----------------|----------------|------|--------|
|
||||||
|
| **tiny_region_id_write_header always_inline** | **HIGH** | **PURSUE** | +1-2% | LOW | 1-2h |
|
||||||
|
| malloc/free branch reduction | MEDIUM | DEFER | +2-3% | MEDIUM | 20-40h |
|
||||||
|
| Cold-path optimization | LOW | **AVOID** | +1% | HIGH | 10-20h |
|
||||||
|
|
||||||
|
**Candidate 1: tiny_region_id_write_header always_inline (2.98% CPU)**
|
||||||
|
- Current: Selective inlining from `core/region_id_v6.c`
|
||||||
|
- Proposal: Force `always_inline` for hot-path call sites
|
||||||
|
- **Layout Impact**: MINIMAL (no code bulk, maintains I-cache discipline)
|
||||||
|
- **Recommendation**: YES - PURSUE
|
||||||
|
- Estimated timeline: Phase 90
|
||||||
|
- Implementation: 1-2 lines, add `__attribute__((always_inline))` wrapper
|
||||||
|
|
||||||
|
**Candidate 2: malloc/free branch reduction (47.76% CPU)**
|
||||||
|
- Current: Phase 9/10/78-1/80-1/83-1 already optimized
|
||||||
|
- Observation: 56.4M branch-misses (branch prediction pressure)
|
||||||
|
- Proposal: Pre-compute routing tables (like Phase 85 approach)
|
||||||
|
- **Risk**: Code bloat, potential layout tax regression (Phase 85 was NO-GO)
|
||||||
|
- **Recommendation**: DEFER
|
||||||
|
- Wait for workload characteristics that justify complexity
|
||||||
|
- Current gains saturation point reached
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 91(終了: NEUTRAL / 凍結)
|
||||||
|
|
||||||
|
**Status**: ⚪ **NEUTRAL**(C6 IFL: +0.38% / 10-run)→ default OFF で保持
|
||||||
|
|
||||||
|
- 目的: C6 inline slots の FIFO を intrusive LIFO に置換して fixed tax を削る
|
||||||
|
- 結果(SSOT 10-run):
|
||||||
|
- Control(`HAKMEM_TINY_C6_INLINE_SLOTS_IFL=0`)mean 52.05M
|
||||||
|
- Treatment(`HAKMEM_TINY_C6_INLINE_SLOTS_IFL=1`)mean 52.25M
|
||||||
|
- Δ **+0.38%**(GO閾値 +1.0% 未達)
|
||||||
|
- 判定: **凍結(research box)**
|
||||||
|
- 回帰は無し、ただし ROI が小さいため C5/C4 へ展開しない
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 92(開始予定)
|
||||||
|
|
||||||
|
**Status**: 🔍 **次フェーズ計画中**
|
||||||
|
|
||||||
|
**目的**: tcmalloc 性能ギャップ(hakmem: 52M vs tcmalloc: 58M, -12.8%)を短時間で原因分類
|
||||||
|
|
||||||
|
**実施予定**:
|
||||||
|
1. ケース A:小 vs 大オブジェクト分離テスト(C6-only vs C7-only)
|
||||||
|
2. ケース B:Inline Slots vs Unified Cache 分離テスト
|
||||||
|
3. ケース C:LIFO vs FIFO 比較
|
||||||
|
4. ケース D:Pool size sensitivity テスト
|
||||||
|
|
||||||
|
**期間**: 1-2h(短時間 Triage)
|
||||||
|
**出力**: Primary bottleneck 特定 → 次の Candidate 選定
|
||||||
|
|
||||||
|
**References**:
|
||||||
|
- Triage Plan: `docs/analysis/PHASE92_TCMALLOC_GAP_TRIAGE_SSOT.md`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Candidate 3: Cold-path de-duplication (16.24% CPU)**
|
||||||
|
- Current: malloc.cold (10.65%) + free.cold (5.59%) explicitly separated
|
||||||
|
- Rationale: Separation improves hot-path I-cache utilization
|
||||||
|
- **Recommendation**: AVOID
|
||||||
|
- Aligns with user's "layout tax 回避" principle
|
||||||
|
- Optimizing cold paths would ADD code to hot path (violates design)
|
||||||
|
|
||||||
|
### Key Performance Insights
|
||||||
|
|
||||||
|
**FAST PGO vs Standard (+5.45%) breakdown**:
|
||||||
|
- PGO branch prediction optimization: ~3%
|
||||||
|
- Code layout optimization: ~2%
|
||||||
|
- Inlining decisions: ~0.5%
|
||||||
|
|
||||||
|
**Conclusion**: Standard build limited by branch prediction pressure; further gains require architectural tradeoffs.
|
||||||
|
|
||||||
|
**Inline Slots Health**: Working perfectly - 0.003% overflow rate confirms no bottleneck
|
||||||
|
|
||||||
|
### References & Artifacts
|
||||||
|
|
||||||
|
- SSOT Measurement: `docs/analysis/PHASE89_SSOT_MEASUREMENT.md`
|
||||||
|
- Bottleneck Analysis: `docs/analysis/PHASE89_BOTTLENECK_ANALYSIS.md`
|
||||||
|
- Perf Stats: `docs/analysis/PHASE89_PERF_STAT.txt`
|
||||||
|
- Scripts: `scripts/run_mixed_10_cleanenv.sh`, `scripts/run_mixed_observe_ssot.sh`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## Phase 86(終了: NO-GO)
|
## Phase 86(終了: NO-GO)
|
||||||
|
|
||||||
**Status**: ❌ NO-GO (+0.25% improvement, threshold: +1.0%)
|
**Status**: ❌ NO-GO (+0.25% improvement, threshold: +1.0%)
|
||||||
@ -19,16 +207,16 @@
|
|||||||
|
|
||||||
## 0) 今の「正」(SSOT)
|
## 0) 今の「正」(SSOT)
|
||||||
|
|
||||||
- **性能比較の正**: FAST PGO build(`make pgo-fast-full` → `bench_random_mixed_hakmem_minimal_pgo`)+ **WarmPool=16**
|
- **現行 SSOT(Phase 89 capture / Git SHA: e4c5f0535)**:
|
||||||
- Phase 75(C5/C6 inline slots)は presets に昇格済み
|
- Standard(`./bench_random_mixed_hakmem`)10-run mean: **51.36M ops/s**(CV ~0.7%)
|
||||||
- Phase 75-4 で FAST PGO rebase を実施し **C5+C6=ON が +3.16% (GO)** を確認(ただし **FAST PGO baseline 自体が Phase 69 から大きく後退**している疑い → Phase 75-5 で PGO 再生成が必要)
|
- FAST PGO minimal(`./bench_random_mixed_hakmem_minimal_pgo`)10-run mean: **54.16M ops/s**(CV ~1.5% / Standard比 +5.45%)
|
||||||
- **安全・互換の正**: Standard build(`make bench_random_mixed_hakmem`)
|
- OBSERVE(`./bench_random_mixed_hakmem_observe`): 51.52M ops/s(telemetry込み、性能比較の正ではない)
|
||||||
- **観測の正**: OBSERVE build(`make perf_observe`)
|
- SSOT capture: `docs/analysis/PHASE89_SSOT_MEASUREMENT.md`
|
||||||
- **スコアカード(目標/現在値)**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md`
|
- **性能最適化の判断の正**: 同一バイナリ A/B(ENVトグル)= `scripts/run_mixed_10_cleanenv.sh`
|
||||||
- **FAST baseline(SSOT)**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md` を正とする(Phase 69: 62.63M ops/s = 51.77% of mimalloc)
|
- **mimalloc/tcmalloc 参照の正**: reference(別バイナリ/LD_PRELOAD)= `docs/analysis/ALLOCATOR_COMPARISON_SSOT.md`
|
||||||
- **Phase 75 の計測(Standard)**: `bench_random_mixed_hakmem` で **A/B +5.41%** を確認(Phase 75-3 4-point matrix)
|
- **スコアカード(目標/現在値の正)**: `docs/analysis/PERFORMANCE_TARGETS_SCORECARD.md`(Phase 89 SSOT を現行 snapshot として反映済み)
|
||||||
- **Phase 75 の計測(FAST PGO)**: `bench_random_mixed_hakmem_minimal_pgo` で **A/B +3.16%** を確認(Phase 75-4 4-point matrix)
|
- Phase 66/68/69(60M〜62M台)は **historical**(現 HEAD と直接比較しない。比較するなら rebase を取る)
|
||||||
- 次の目標: **M2 = 55%**(gap は FAST baseline を基準に判断する)
|
- **次フェーズ(設計見直し)**: `docs/analysis/PHASE90_STRUCTURAL_REVIEW_AND_GAP_TRIAGE_SSOT.md`
|
||||||
- **Mixed 10-run SSOT(ハーネス)**: `scripts/run_mixed_10_cleanenv.sh`
|
- **Mixed 10-run SSOT(ハーネス)**: `scripts/run_mixed_10_cleanenv.sh`
|
||||||
- デフォルト `BENCH_BIN=./bench_random_mixed_hakmem`(Standard)
|
- デフォルト `BENCH_BIN=./bench_random_mixed_hakmem`(Standard)
|
||||||
- FAST PGO は `BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo` を明示する
|
- FAST PGO は `BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo` を明示する
|
||||||
@ -86,6 +274,32 @@
|
|||||||
- 結果: `docs/analysis/PHASE83_1_SWITCH_DISPATCH_FIXED_RESULTS.md`
|
- 結果: `docs/analysis/PHASE83_1_SWITCH_DISPATCH_FIXED_RESULTS.md`
|
||||||
- 原因: lazy-init pattern が既に最適化済み(per-op overhead minimal)→ fixed mode の ROI 極小
|
- 原因: lazy-init pattern が既に最適化済み(per-op overhead minimal)→ fixed mode の ROI 極小
|
||||||
|
|
||||||
|
## 2a) 次の大方針(設計の順番、SSOT)
|
||||||
|
|
||||||
|
目的: “mimalloc/tcmalloc が強すぎる”状況でも、Box Theory(境界1箇所・戻せる・可視化最小・fail-fast)を崩さず **+5–10%** を狙う。
|
||||||
|
|
||||||
|
優先順(Google/TCMalloc の芯を参考にする):
|
||||||
|
|
||||||
|
1. **ThreadCache overflow のバッチ化(最優先)**
|
||||||
|
- inline slots(C4/C5/C6)が満杯になったときの overflow を「1個ずつ」ではなく「まとめて」冷やす
|
||||||
|
- 変換点は 1 箇所(flush/drain)に固定
|
||||||
|
2. **Central/Shared 側のバッチ push/pop(次点)**
|
||||||
|
- shared/remote への統合をバッチ化して lock/atomic の回数を減らす
|
||||||
|
3. **Memory return / footprint policy(運用軸)**
|
||||||
|
- Balanced/Lean の勝ち筋(syscall/RSS drift/tail)をSSOT化しつつ、速度を落とさない範囲で攻める
|
||||||
|
|
||||||
|
重要: 現状は「設計の芯」を決める段階。実装は **計測で overflow の頻度が十分に高い**ことを確認してから。
|
||||||
|
|
||||||
|
## 2b) 次の作業(待機中)
|
||||||
|
|
||||||
|
ユーザーが別エージェント(Claude Code)に依頼した処理が完了するまで待機する。
|
||||||
|
完了後に着手するチェック(最短で必要な2つ):
|
||||||
|
|
||||||
|
- **inline slots overflow 率の計測**(C4/C5/C6 の FULL/overflow 回数・割合)
|
||||||
|
- **overflow 先のコストの定量化**(overflow 時に落ちる関数の perf stat / perf report)
|
||||||
|
|
||||||
|
これが揃ったら Phase 86(Overflow batch design)へ進む。
|
||||||
|
|
||||||
## 3) 運用ルール(Box Theory + layout tax 対策)
|
## 3) 運用ルール(Box Theory + layout tax 対策)
|
||||||
|
|
||||||
- 変更は必ず **箱 + 境界1箇所 + ENVで戻せる** で積む(Fail-fast、最小可視化)。
|
- 変更は必ず **箱 + 境界1箇所 + ENVで戻せる** で積む(Fail-fast、最小可視化)。
|
||||||
|
|||||||
32
Makefile
32
Makefile
@ -232,6 +232,17 @@ CFLAGS += -DHAKMEM_TINY_CLASS5_FIXED_REFILL=1
|
|||||||
CFLAGS_SHARED += -DHAKMEM_TINY_CLASS5_FIXED_REFILL=1
|
CFLAGS_SHARED += -DHAKMEM_TINY_CLASS5_FIXED_REFILL=1
|
||||||
endif
|
endif
|
||||||
|
|
||||||
|
# Phase 91: C6 Intrusive LIFO Inline Slots (Per-class LIFO transformation)
|
||||||
|
# Purpose: Replace FIFO ring with intrusive LIFO to reduce per-operation metadata overhead
|
||||||
|
# Enable: make BOX_TINY_C6_INLINE_SLOTS_IFL=1
|
||||||
|
# Expected: +1-2% throughput improvement (C6 only, 57% coverage)
|
||||||
|
# Default: ON (research box, reversible via ENV gate HAKMEM_TINY_C6_INLINE_SLOTS_IFL=0)
|
||||||
|
BOX_TINY_C6_INLINE_SLOTS_IFL ?= 1
|
||||||
|
ifeq ($(BOX_TINY_C6_INLINE_SLOTS_IFL),1)
|
||||||
|
CFLAGS += -DHAKMEM_BOX_TINY_C6_INLINE_SLOTS_IFL=1
|
||||||
|
CFLAGS_SHARED += -DHAKMEM_BOX_TINY_C6_INLINE_SLOTS_IFL=1
|
||||||
|
endif
|
||||||
|
|
||||||
# Phase 3 (2025-11-29): mincore removed entirely
|
# Phase 3 (2025-11-29): mincore removed entirely
|
||||||
# - mincore() syscall overhead eliminated (was +10.3% with DISABLE flag)
|
# - mincore() syscall overhead eliminated (was +10.3% with DISABLE flag)
|
||||||
# - Phase 1b/2 registry-based validation provides sufficient safety
|
# - Phase 1b/2 registry-based validation provides sufficient safety
|
||||||
@ -253,7 +264,7 @@ LDFLAGS += $(EXTRA_LDFLAGS)
|
|||||||
|
|
||||||
# Targets
|
# Targets
|
||||||
TARGET = test_hakmem
|
TARGET = test_hakmem
|
||||||
OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o core/box/ss_release_policy_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/box/tiny_tcache_env_box.o core/box/tiny_unified_lifo_env_box.o core/box/front_fastlane_alloc_legacy_direct_env_box.o core/box/fastlane_direct_env_box.o core/box/tiny_header_hotfull_env_box.o core/box/tiny_inline_slots_fixed_mode_box.o core/box/tiny_inline_slots_switch_dispatch_fixed_box.o core/box/free_path_commit_once_fixed_box.o core/box/free_path_legacy_mask_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/tiny_c6_inline_slots.o core/tiny_c5_inline_slots.o core/tiny_c2_local_cache.o core/tiny_c3_inline_slots.o core/tiny_c4_inline_slots.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o core/box/small_policy_snapshot_tls_box.o
|
OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o core/box/ss_release_policy_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/box/tiny_tcache_env_box.o core/box/tiny_unified_lifo_env_box.o core/box/front_fastlane_alloc_legacy_direct_env_box.o core/box/fastlane_direct_env_box.o core/box/tiny_header_hotfull_env_box.o core/box/tiny_inline_slots_fixed_mode_box.o core/box/tiny_inline_slots_switch_dispatch_fixed_box.o core/box/free_path_commit_once_fixed_box.o core/box/free_path_legacy_mask_box.o core/box/tiny_inline_slots_overflow_stats_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/tiny_c6_inline_slots.o core/tiny_c6_inline_slots_ifl.o core/tiny_c5_inline_slots.o core/tiny_c2_local_cache.o core/tiny_c3_inline_slots.o core/tiny_c4_inline_slots.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o core/box/small_policy_snapshot_tls_box.o
|
||||||
OBJS = $(OBJS_BASE)
|
OBJS = $(OBJS_BASE)
|
||||||
|
|
||||||
# Shared library
|
# Shared library
|
||||||
@ -287,7 +298,7 @@ endif
|
|||||||
# Benchmark targets
|
# Benchmark targets
|
||||||
BENCH_HAKMEM = bench_allocators_hakmem
|
BENCH_HAKMEM = bench_allocators_hakmem
|
||||||
BENCH_SYSTEM = bench_allocators_system
|
BENCH_SYSTEM = bench_allocators_system
|
||||||
BENCH_HAKMEM_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o core/box/ss_release_policy_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/fastlane_direct_env_box.o core/box/tiny_inline_slots_fixed_mode_box.o core/box/tiny_inline_slots_switch_dispatch_fixed_box.o core/box/free_path_commit_once_fixed_box.o core/box/free_path_legacy_mask_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/tiny_c6_inline_slots.o core/tiny_c5_inline_slots.o core/tiny_c2_local_cache.o core/tiny_c3_inline_slots.o core/tiny_c4_inline_slots.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o core/box/small_policy_snapshot_tls_box.o bench_allocators_hakmem.o
|
BENCH_HAKMEM_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o core/box/ss_release_policy_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/free_publish_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/fastlane_direct_env_box.o core/box/tiny_inline_slots_fixed_mode_box.o core/box/tiny_inline_slots_switch_dispatch_fixed_box.o core/box/free_path_commit_once_fixed_box.o core/box/free_path_legacy_mask_box.o core/box/tiny_inline_slots_overflow_stats_box.o core/page_arena.o core/front/tiny_unified_cache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/tiny_c6_inline_slots.o core/tiny_c6_inline_slots_ifl.o core/tiny_c5_inline_slots.o core/tiny_c2_local_cache.o core/tiny_c3_inline_slots.o core/tiny_c4_inline_slots.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o core/box/small_policy_snapshot_tls_box.o bench_allocators_hakmem.o
|
||||||
BENCH_HAKMEM_OBJS = $(BENCH_HAKMEM_OBJS_BASE)
|
BENCH_HAKMEM_OBJS = $(BENCH_HAKMEM_OBJS_BASE)
|
||||||
ifeq ($(POOL_TLS_PHASE1),1)
|
ifeq ($(POOL_TLS_PHASE1),1)
|
||||||
BENCH_HAKMEM_OBJS += pool_tls.o pool_refill.o pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o
|
BENCH_HAKMEM_OBJS += pool_tls.o pool_refill.o pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o
|
||||||
@ -464,7 +475,7 @@ test-box-refactor: box-refactor
|
|||||||
./larson_hakmem 10 8 128 1024 1 12345 4
|
./larson_hakmem 10 8 128 1024 1 12345 4
|
||||||
|
|
||||||
# Phase 4: Tiny Pool benchmarks (properly linked with hakmem)
|
# Phase 4: Tiny Pool benchmarks (properly linked with hakmem)
|
||||||
TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o core/box/ss_release_policy_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/box/tiny_tcache_env_box.o core/box/tiny_unified_lifo_env_box.o core/box/front_fastlane_alloc_legacy_direct_env_box.o core/box/fastlane_direct_env_box.o core/box/tiny_header_hotfull_env_box.o core/box/tiny_inline_slots_fixed_mode_box.o core/box/tiny_inline_slots_switch_dispatch_fixed_box.o core/box/free_path_commit_once_fixed_box.o core/box/free_path_legacy_mask_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/tiny_c6_inline_slots.o core/tiny_c5_inline_slots.o core/tiny_c2_local_cache.o core/tiny_c3_inline_slots.o core/tiny_c4_inline_slots.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o core/box/small_policy_snapshot_tls_box.o
|
TINY_BENCH_OBJS_BASE = hakmem.o hakmem_config.o hakmem_tiny_config.o hakmem_ucb1.o hakmem_bigcache.o hakmem_pool.o hakmem_l25_pool.o hakmem_site_rules.o hakmem_tiny.o core/box/ss_allocation_box.o core/box/ss_release_policy_box.o superslab_stats.o superslab_cache.o superslab_ace.o superslab_slab.o superslab_backend.o core/superslab_head_stub.o hakmem_smallmid.o core/box/superslab_expansion_box.o core/box/integrity_box.o core/box/mailbox_box.o core/box/front_gate_box.o core/box/front_gate_classifier.o core/box/free_publish_box.o core/box/capacity_box.o core/box/carve_push_box.o core/box/prewarm_box.o core/box/ss_hot_prewarm_box.o core/box/front_metrics_box.o core/box/bench_fast_box.o core/box/ss_addr_map_box.o core/box/ss_pt_impl.o core/box/slab_recycling_box.o core/box/pagefault_telemetry_box.o core/box/tiny_sizeclass_hist_box.o core/box/tiny_env_box.o core/box/tiny_route_box.o core/box/free_front_v3_env_box.o core/box/free_path_stats_box.o core/box/free_dispatch_stats_box.o core/box/free_cold_shape_env_box.o core/box/free_cold_shape_stats_box.o core/box/alloc_gate_stats_box.o core/box/tiny_c6_ultra_free_box.o core/box/tiny_c5_ultra_free_box.o core/box/tiny_c4_ultra_free_box.o core/box/tiny_ultra_tls_box.o core/box/tiny_page_box.o core/box/tiny_class_policy_box.o core/box/tiny_class_stats_box.o core/box/tiny_policy_learner_box.o core/box/ss_budget_box.o core/box/tiny_mem_stats_box.o core/box/c7_meta_used_counter_box.o core/box/tiny_static_route_box.o core/box/tiny_metadata_cache_hot_box.o core/box/wrapper_env_box.o core/box/free_wrapper_env_snapshot_box.o core/box/malloc_wrapper_env_snapshot_box.o core/box/madvise_guard_box.o core/box/libm_reloc_guard_box.o core/box/ptr_trace_box.o core/box/link_missing_stubs.o core/box/super_reg_box.o core/box/shared_pool_box.o core/box/remote_side_box.o core/box/tiny_free_route_cache_env_box.o core/box/hakmem_env_snapshot_box.o core/box/tiny_c7_preserve_header_env_box.o core/box/tiny_tcache_env_box.o core/box/tiny_unified_lifo_env_box.o core/box/front_fastlane_alloc_legacy_direct_env_box.o core/box/fastlane_direct_env_box.o core/box/tiny_header_hotfull_env_box.o core/box/tiny_inline_slots_fixed_mode_box.o core/box/tiny_inline_slots_switch_dispatch_fixed_box.o core/box/free_path_commit_once_fixed_box.o core/box/free_path_legacy_mask_box.o core/box/tiny_inline_slots_overflow_stats_box.o core/page_arena.o core/front/tiny_unified_cache.o tiny_sticky.o tiny_remote.o tiny_publish.o tiny_debug_ring.o hakmem_tiny_magazine.o hakmem_tiny_stats.o hakmem_tiny_sfc.o hakmem_tiny_query.o hakmem_tiny_rss.o hakmem_tiny_registry.o hakmem_tiny_remote_target.o hakmem_tiny_bg_spill.o tiny_adaptive_sizing.o hakmem_super_registry.o hakmem_shared_pool.o hakmem_shared_pool_acquire.o hakmem_shared_pool_release.o hakmem_elo.o hakmem_batch.o hakmem_p2.o hakmem_sizeclass_dist.o hakmem_evo.o hakmem_debug.o hakmem_sys.o hakmem_whale.o hakmem_policy.o hakmem_ace.o hakmem_ace_stats.o hakmem_prof.o hakmem_learner.o hakmem_size_hist.o hakmem_learn_log.o hakmem_syscall.o hakmem_ace_metrics.o hakmem_ace_ucb1.o hakmem_ace_controller.o tiny_fastcache.o core/tiny_alloc_fast_push.o core/tiny_c7_ultra_segment.o core/tiny_c7_ultra.o core/tiny_c6_inline_slots.o core/tiny_c6_inline_slots_ifl.o core/tiny_c5_inline_slots.o core/tiny_c2_local_cache.o core/tiny_c3_inline_slots.o core/tiny_c4_inline_slots.o core/link_stubs.o core/tiny_failfast.o core/tiny_destructors.o core/smallobject_hotbox_v3.o core/smallobject_hotbox_v4.o core/smallobject_hotbox_v5.o core/smallsegment_v5.o core/smallobject_cold_iface_v5.o core/smallsegment_v6.o core/smallobject_cold_iface_v6.o core/smallobject_core_v6.o core/region_id_v6.o core/smallsegment_v7.o core/smallobject_cold_iface_v7.o core/mid_hotbox_v3.o core/smallobject_policy_v7.o core/smallobject_segment_mid_v3.o core/smallobject_cold_iface_mid_v3.o core/smallobject_stats_mid_v3.o core/smallobject_learner_v2.o core/smallobject_mid_v35.o core/box/small_policy_snapshot_tls_box.o
|
||||||
TINY_BENCH_OBJS = $(TINY_BENCH_OBJS_BASE)
|
TINY_BENCH_OBJS = $(TINY_BENCH_OBJS_BASE)
|
||||||
ifeq ($(POOL_TLS_PHASE1),1)
|
ifeq ($(POOL_TLS_PHASE1),1)
|
||||||
TINY_BENCH_OBJS += pool_tls.o pool_refill.o core/pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o
|
TINY_BENCH_OBJS += pool_tls.o pool_refill.o core/pool_tls_arena.o pool_tls_registry.o pool_tls_remote.o
|
||||||
@ -714,14 +725,23 @@ pgo-fast-build:
|
|||||||
@echo "========================================="
|
@echo "========================================="
|
||||||
@echo "Phase 66: Building PGO-Optimized Binary (FAST minimal)"
|
@echo "Phase 66: Building PGO-Optimized Binary (FAST minimal)"
|
||||||
@echo "========================================="
|
@echo "========================================="
|
||||||
|
@if [ -x bench_random_mixed_hakmem ]; then mv bench_random_mixed_hakmem bench_random_mixed_hakmem.standard_saved; fi
|
||||||
$(MAKE) clean
|
$(MAKE) clean
|
||||||
$(MAKE) PROFILE_USE=1 bench_random_mixed_hakmem EXTRA_CFLAGS='-DHAKMEM_BENCH_MINIMAL=1'
|
$(MAKE) PROFILE_USE=1 bench_random_mixed_hakmem EXTRA_CFLAGS='-DHAKMEM_BENCH_MINIMAL=1'
|
||||||
mv bench_random_mixed_hakmem bench_random_mixed_hakmem_minimal_pgo
|
mv bench_random_mixed_hakmem bench_random_mixed_hakmem_minimal_pgo
|
||||||
|
@if [ -x bench_random_mixed_hakmem.standard_saved ]; then mv bench_random_mixed_hakmem.standard_saved bench_random_mixed_hakmem; fi
|
||||||
@echo ""
|
@echo ""
|
||||||
@echo "✓ PGO-optimized FAST minimal binary built: bench_random_mixed_hakmem_minimal_pgo"
|
@echo "✓ PGO-optimized FAST minimal binary built: bench_random_mixed_hakmem_minimal_pgo"
|
||||||
@echo "Next: BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh"
|
@echo "Next: BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo scripts/run_mixed_10_cleanenv.sh"
|
||||||
@echo ""
|
@echo ""
|
||||||
|
|
||||||
|
pgo-fast-bin: pgo-fast-build
|
||||||
|
|
||||||
|
# Convenience alias (SSOT runner expects this name to be buildable).
|
||||||
|
# Usage: make bench_random_mixed_hakmem_minimal_pgo
|
||||||
|
.PHONY: bench_random_mixed_hakmem_minimal_pgo
|
||||||
|
bench_random_mixed_hakmem_minimal_pgo: pgo-fast-build
|
||||||
|
|
||||||
pgo-fast-full: pgo-fast-profile pgo-fast-collect pgo-fast-build
|
pgo-fast-full: pgo-fast-profile pgo-fast-collect pgo-fast-build
|
||||||
@echo "========================================="
|
@echo "========================================="
|
||||||
@echo "Phase 66: PGO Full Workflow Complete (FAST minimal)"
|
@echo "Phase 66: PGO Full Workflow Complete (FAST minimal)"
|
||||||
@ -734,9 +754,11 @@ pgo-fast-full: pgo-fast-profile pgo-fast-collect pgo-fast-build
|
|||||||
# Purpose: FAST build with compile-time fixed front config (phase 47 A/B test)
|
# Purpose: FAST build with compile-time fixed front config (phase 47 A/B test)
|
||||||
.PHONY: bench_random_mixed_hakmem_fast_pgo
|
.PHONY: bench_random_mixed_hakmem_fast_pgo
|
||||||
bench_random_mixed_hakmem_fast_pgo:
|
bench_random_mixed_hakmem_fast_pgo:
|
||||||
|
@if [ -x bench_random_mixed_hakmem ]; then mv bench_random_mixed_hakmem bench_random_mixed_hakmem.standard_saved; fi
|
||||||
$(MAKE) clean
|
$(MAKE) clean
|
||||||
$(MAKE) bench_random_mixed_hakmem EXTRA_CFLAGS='-DHAKMEM_BENCH_MINIMAL=1 -DHAKMEM_TINY_FRONT_PGO=1'
|
$(MAKE) bench_random_mixed_hakmem EXTRA_CFLAGS='-DHAKMEM_BENCH_MINIMAL=1 -DHAKMEM_TINY_FRONT_PGO=1'
|
||||||
mv bench_random_mixed_hakmem bench_random_mixed_hakmem_fast_pgo
|
mv bench_random_mixed_hakmem bench_random_mixed_hakmem_fast_pgo
|
||||||
|
@if [ -x bench_random_mixed_hakmem.standard_saved ]; then mv bench_random_mixed_hakmem.standard_saved bench_random_mixed_hakmem; fi
|
||||||
|
|
||||||
# Phase 35-B: OBSERVE target (enables diagnostic counters for behavior observation)
|
# Phase 35-B: OBSERVE target (enables diagnostic counters for behavior observation)
|
||||||
# Usage: make bench_random_mixed_hakmem_observe
|
# Usage: make bench_random_mixed_hakmem_observe
|
||||||
@ -744,9 +766,11 @@ bench_random_mixed_hakmem_fast_pgo:
|
|||||||
# Purpose: Behavior observation & debugging (OBSERVE build)
|
# Purpose: Behavior observation & debugging (OBSERVE build)
|
||||||
.PHONY: bench_random_mixed_hakmem_observe
|
.PHONY: bench_random_mixed_hakmem_observe
|
||||||
bench_random_mixed_hakmem_observe:
|
bench_random_mixed_hakmem_observe:
|
||||||
|
@if [ -x bench_random_mixed_hakmem ]; then mv bench_random_mixed_hakmem bench_random_mixed_hakmem.standard_saved; fi
|
||||||
$(MAKE) clean
|
$(MAKE) clean
|
||||||
$(MAKE) bench_random_mixed_hakmem EXTRA_CFLAGS='-DHAKMEM_TINY_CLASS_STATS_COMPILED=1 -DHAKMEM_TINY_FREE_STATS_COMPILED=1 -DHAKMEM_UNIFIED_CACHE_STATS_COMPILED=1 -DHAKMEM_TINY_FREE_TRACE_COMPILED=1'
|
$(MAKE) bench_random_mixed_hakmem EXTRA_CFLAGS='-DHAKMEM_TINY_CLASS_STATS_COMPILED=1 -DHAKMEM_TINY_FREE_STATS_COMPILED=1 -DHAKMEM_UNIFIED_CACHE_STATS_COMPILED=1 -DHAKMEM_TINY_FREE_TRACE_COMPILED=1 -DHAKMEM_INLINE_SLOTS_OVERFLOW_STATS_COMPILED=1'
|
||||||
mv bench_random_mixed_hakmem bench_random_mixed_hakmem_observe
|
mv bench_random_mixed_hakmem bench_random_mixed_hakmem_observe
|
||||||
|
@if [ -x bench_random_mixed_hakmem.standard_saved ]; then mv bench_random_mixed_hakmem.standard_saved bench_random_mixed_hakmem; fi
|
||||||
|
|
||||||
# Phase 38: Automated perf workflow targets
|
# Phase 38: Automated perf workflow targets
|
||||||
# Usage: make perf_fast - Build FAST binary and run 10-run benchmark
|
# Usage: make perf_fast - Build FAST binary and run 10-run benchmark
|
||||||
|
|||||||
@ -28,6 +28,7 @@
|
|||||||
#include "core/box/ss_stats_box.h"
|
#include "core/box/ss_stats_box.h"
|
||||||
#include "core/box/warm_pool_rel_counters_box.h"
|
#include "core/box/warm_pool_rel_counters_box.h"
|
||||||
#include "core/box/tiny_mem_stats_box.h"
|
#include "core/box/tiny_mem_stats_box.h"
|
||||||
|
#include "core/box/tiny_inline_slots_overflow_stats_box.h"
|
||||||
|
|
||||||
// Box BenchMeta: Benchmark metadata management (bypass hakmem wrapper)
|
// Box BenchMeta: Benchmark metadata management (bypass hakmem wrapper)
|
||||||
// Phase 15: Separate BenchMeta (slots array) from CoreAlloc (user workload)
|
// Phase 15: Separate BenchMeta (slots array) from CoreAlloc (user workload)
|
||||||
@ -423,5 +424,10 @@ int main(int argc, char** argv){
|
|||||||
#endif
|
#endif
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
|
// Phase 87: Print overflow statistics
|
||||||
|
#ifdef USE_HAKMEM
|
||||||
|
tiny_inline_slots_overflow_report_stats();
|
||||||
|
#endif
|
||||||
|
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|||||||
@ -19,6 +19,7 @@
|
|||||||
#include "box/tiny_inline_slots_fixed_mode_box.h" // tiny_inline_slots_fixed_mode_refresh_from_env (Phase 78-1)
|
#include "box/tiny_inline_slots_fixed_mode_box.h" // tiny_inline_slots_fixed_mode_refresh_from_env (Phase 78-1)
|
||||||
#include "box/free_path_commit_once_fixed_box.h" // free_path_commit_once_refresh_from_env (Phase 85)
|
#include "box/free_path_commit_once_fixed_box.h" // free_path_commit_once_refresh_from_env (Phase 85)
|
||||||
#include "box/free_path_legacy_mask_box.h" // free_path_legacy_mask_refresh_from_env (Phase 86)
|
#include "box/free_path_legacy_mask_box.h" // free_path_legacy_mask_refresh_from_env (Phase 86)
|
||||||
|
#include "box/tiny_c6_inline_slots_ifl_env_box.h" // tiny_c6_inline_slots_ifl_refresh_from_env (Phase 91)
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
// env が未設定のときだけ既定値を入れる
|
// env が未設定のときだけ既定値を入れる
|
||||||
@ -241,5 +242,7 @@ static inline void bench_apply_profile(void) {
|
|||||||
free_path_commit_once_refresh_from_env();
|
free_path_commit_once_refresh_from_env();
|
||||||
// Phase 86: Optionally use legacy mask for early exit (no indirect calls, just bit test).
|
// Phase 86: Optionally use legacy mask for early exit (no indirect calls, just bit test).
|
||||||
free_path_legacy_mask_refresh_from_env();
|
free_path_legacy_mask_refresh_from_env();
|
||||||
|
// Phase 91: C6 intrusive LIFO inline slots (per-class LIFO transformation).
|
||||||
|
tiny_c6_inline_slots_ifl_refresh_from_env();
|
||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
|
|||||||
47
core/box/tiny_c6_inline_slots_ifl_env_box.h
Normal file
47
core/box/tiny_c6_inline_slots_ifl_env_box.h
Normal file
@ -0,0 +1,47 @@
|
|||||||
|
// tiny_c6_inline_slots_ifl_env_box.h - Phase 91: C6 Intrusive LIFO Inline Slots ENV Gate
|
||||||
|
//
|
||||||
|
// Goal: Runtime ENV gate for C6-only intrusive LIFO inline slots optimization
|
||||||
|
// Scope: C6 class only (FIFO ring → intrusive LIFO transformation)
|
||||||
|
// Default: OFF (research box, ENV=0)
|
||||||
|
//
|
||||||
|
// ENV Variables:
|
||||||
|
// HAKMEM_TINY_C6_INLINE_SLOTS_IFL=0/1 (default: 0, OFF)
|
||||||
|
// HAKMEM_TINY_C6_IFL_STRICT=0/1 (LARSON_FIX safety check)
|
||||||
|
//
|
||||||
|
// Design:
|
||||||
|
// - Extern refresh function called from bench_profile.h (fixed mode pattern)
|
||||||
|
// - Thread-safe initialization via refresh_all_env_caches()
|
||||||
|
// - Fail-fast on LARSON_FIX + IFL conflict
|
||||||
|
//
|
||||||
|
// Phase 91: C6-only intrusive LIFO (replaces FIFO ring)
|
||||||
|
// Phase 91+: C5, C4 expansion if C6 GO
|
||||||
|
|
||||||
|
#ifndef HAK_BOX_TINY_C6_INLINE_SLOTS_IFL_ENV_BOX_H
|
||||||
|
#define HAK_BOX_TINY_C6_INLINE_SLOTS_IFL_ENV_BOX_H
|
||||||
|
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <stdint.h>
|
||||||
|
#include "../hakmem_build_flags.h"
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// ENV Gate: C6 Intrusive LIFO Inline Slots
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
extern uint8_t g_tiny_c6_inline_slots_ifl_enabled;
|
||||||
|
extern uint8_t g_tiny_c6_inline_slots_ifl_strict;
|
||||||
|
|
||||||
|
// Refresh ENV variables (called from bench_profile.h::refresh_all_env_caches)
|
||||||
|
void tiny_c6_inline_slots_ifl_refresh_from_env(void);
|
||||||
|
|
||||||
|
// Check if C6 inline slots IFL are enabled (cached by refresh function)
|
||||||
|
static inline int tiny_c6_inline_slots_ifl_enabled(void) {
|
||||||
|
return g_tiny_c6_inline_slots_ifl_enabled;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Fast path version (same as enabled, for naming consistency with other box pattern)
|
||||||
|
static inline int tiny_c6_inline_slots_ifl_enabled_fast(void) {
|
||||||
|
return g_tiny_c6_inline_slots_ifl_enabled;
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif // HAK_BOX_TINY_C6_INLINE_SLOTS_IFL_ENV_BOX_H
|
||||||
85
core/box/tiny_c6_inline_slots_ifl_tls_box.h
Normal file
85
core/box/tiny_c6_inline_slots_ifl_tls_box.h
Normal file
@ -0,0 +1,85 @@
|
|||||||
|
// tiny_c6_inline_slots_ifl_tls_box.h - Phase 91: C6 Intrusive LIFO TLS State & Wrappers
|
||||||
|
//
|
||||||
|
// Goal: Thread-local state for C6 intrusive LIFO inline slots + inline push/pop wrappers
|
||||||
|
// Scope: Per-thread LIFO head pointer, count, enabled flag
|
||||||
|
// Integration: Thin wrapper over tiny_c6_intrusive_freelist_box.h (c6_ifl_*)
|
||||||
|
//
|
||||||
|
// TLS State:
|
||||||
|
// - head: LIFO stack pointer (intrusive, embedded next in freed objects)
|
||||||
|
// - count: Current entries (drain triggered at count > 128)
|
||||||
|
// - enabled: Cached flag from tiny_c6_inline_slots_ifl_env_box.h
|
||||||
|
//
|
||||||
|
// Phase 91: C6-only IFL implementation
|
||||||
|
// Phase 91+: C5, C4 expansion via similar pattern
|
||||||
|
|
||||||
|
#ifndef HAK_BOX_TINY_C6_INLINE_SLOTS_IFL_TLS_BOX_H
|
||||||
|
#define HAK_BOX_TINY_C6_INLINE_SLOTS_IFL_TLS_BOX_H
|
||||||
|
|
||||||
|
#include <stdbool.h>
|
||||||
|
#include <stdint.h>
|
||||||
|
#include "../tiny_nextptr.h"
|
||||||
|
#include "tiny_c6_intrusive_freelist_box.h"
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// TLS State Structure
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
struct TinyC6InlineSlotsIFL {
|
||||||
|
void* head; // LIFO stack pointer (intrusive next embedded)
|
||||||
|
uint16_t count; // Current entry count
|
||||||
|
uint8_t enabled; // Cached flag from ENV gate
|
||||||
|
};
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// TLS Variable (defined in core/tiny_c6_inline_slots_ifl.c)
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
extern __thread struct TinyC6InlineSlotsIFL g_tiny_c6_inline_slots_ifl;
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Fast-Path Inline Accessors
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
// Push object to C6 LIFO (intrusive)
|
||||||
|
// Returns: true if push succeeded, false if disabled
|
||||||
|
static inline bool tiny_c6_inline_slots_ifl_push_fast(void* ptr) {
|
||||||
|
if (!g_tiny_c6_inline_slots_ifl.enabled) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Push to intrusive LIFO head (delegates to c6_ifl_push)
|
||||||
|
c6_ifl_push(&g_tiny_c6_inline_slots_ifl.head, ptr);
|
||||||
|
g_tiny_c6_inline_slots_ifl.count++;
|
||||||
|
|
||||||
|
// Overflow: count > 128 triggers drain (handled by caller)
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Pop object from C6 LIFO (intrusive)
|
||||||
|
// Returns: pointer to freed object, or NULL if empty/disabled
|
||||||
|
static inline void* tiny_c6_inline_slots_ifl_pop_fast(void) {
|
||||||
|
if (!g_tiny_c6_inline_slots_ifl.enabled || g_tiny_c6_inline_slots_ifl.count == 0) {
|
||||||
|
return NULL;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Pop from intrusive LIFO head (delegates to c6_ifl_pop)
|
||||||
|
void* ptr = c6_ifl_pop(&g_tiny_c6_inline_slots_ifl.head);
|
||||||
|
if (ptr != NULL) {
|
||||||
|
g_tiny_c6_inline_slots_ifl.count--;
|
||||||
|
}
|
||||||
|
|
||||||
|
return ptr;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check availability
|
||||||
|
static inline bool tiny_c6_inline_slots_ifl_available(void) {
|
||||||
|
return g_tiny_c6_inline_slots_ifl.enabled && g_tiny_c6_inline_slots_ifl.count > 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Overflow Handler (declared, defined in core/tiny_c6_inline_slots_ifl.c)
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
void tiny_c6_inline_slots_ifl_drain_to_unified(void);
|
||||||
|
|
||||||
|
#endif // HAK_BOX_TINY_C6_INLINE_SLOTS_IFL_TLS_BOX_H
|
||||||
@ -44,6 +44,8 @@
|
|||||||
#include "tiny_inline_slots_fixed_mode_box.h" // Phase 78-1: Optional fixed-mode gating
|
#include "tiny_inline_slots_fixed_mode_box.h" // Phase 78-1: Optional fixed-mode gating
|
||||||
#include "tiny_inline_slots_switch_dispatch_box.h" // Phase 80-1: Switch dispatch for C4/C5/C6
|
#include "tiny_inline_slots_switch_dispatch_box.h" // Phase 80-1: Switch dispatch for C4/C5/C6
|
||||||
#include "tiny_inline_slots_switch_dispatch_fixed_box.h" // Phase 83-1: Switch dispatch fixed mode
|
#include "tiny_inline_slots_switch_dispatch_fixed_box.h" // Phase 83-1: Switch dispatch fixed mode
|
||||||
|
#include "tiny_c6_inline_slots_ifl_env_box.h" // Phase 91: C6 intrusive LIFO inline slots ENV gate
|
||||||
|
#include "tiny_c6_inline_slots_ifl_tls_box.h" // Phase 91: C6 intrusive LIFO inline slots TLS state
|
||||||
|
|
||||||
// ============================================================================
|
// ============================================================================
|
||||||
// Branch Prediction Macros (Pointer Safety - Prediction Hints)
|
// Branch Prediction Macros (Pointer Safety - Prediction Hints)
|
||||||
@ -156,6 +158,19 @@ static inline void* tiny_hot_alloc_fast(int class_idx) {
|
|||||||
}
|
}
|
||||||
break;
|
break;
|
||||||
case 6:
|
case 6:
|
||||||
|
// Phase 91: C6 Intrusive LIFO Inline Slots (check BEFORE FIFO)
|
||||||
|
if (tiny_c6_inline_slots_ifl_enabled_fast()) {
|
||||||
|
void* base = tiny_c6_inline_slots_ifl_pop_fast();
|
||||||
|
if (TINY_HOT_LIKELY(base != NULL)) {
|
||||||
|
TINY_HOT_METRICS_HIT(class_idx);
|
||||||
|
#if HAKMEM_TINY_HEADER_CLASSIDX
|
||||||
|
return tiny_header_finalize_alloc(base, class_idx);
|
||||||
|
#else
|
||||||
|
return base;
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// Phase 75-1: C6 Inline Slots (FIFO - fallback)
|
||||||
if (tiny_c6_inline_slots_enabled_fast()) {
|
if (tiny_c6_inline_slots_enabled_fast()) {
|
||||||
void* base = c6_inline_pop(c6_inline_tls());
|
void* base = c6_inline_pop(c6_inline_tls());
|
||||||
if (TINY_HOT_LIKELY(base != NULL)) {
|
if (TINY_HOT_LIKELY(base != NULL)) {
|
||||||
@ -222,6 +237,21 @@ static inline void* tiny_hot_alloc_fast(int class_idx) {
|
|||||||
// C5 inline miss → fall through to C6/unified cache
|
// C5 inline miss → fall through to C6/unified cache
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Phase 91: C6 Intrusive LIFO Inline Slots early-exit (ENV gated)
|
||||||
|
// Try C6 IFL THIRD (before C6 FIFO and unified cache) for class 6
|
||||||
|
if (class_idx == 6 && tiny_c6_inline_slots_ifl_enabled_fast()) {
|
||||||
|
void* base = tiny_c6_inline_slots_ifl_pop_fast();
|
||||||
|
if (TINY_HOT_LIKELY(base != NULL)) {
|
||||||
|
TINY_HOT_METRICS_HIT(class_idx);
|
||||||
|
#if HAKMEM_TINY_HEADER_CLASSIDX
|
||||||
|
return tiny_header_finalize_alloc(base, class_idx);
|
||||||
|
#else
|
||||||
|
return base;
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
// C6 IFL miss → fall through to C6 FIFO
|
||||||
|
}
|
||||||
|
|
||||||
// Phase 75-1: C6 Inline Slots early-exit (ENV gated)
|
// Phase 75-1: C6 Inline Slots early-exit (ENV gated)
|
||||||
// Try C6 inline slots THIRD (before unified cache) for class 6
|
// Try C6 inline slots THIRD (before unified cache) for class 6
|
||||||
if (class_idx == 6 && tiny_c6_inline_slots_enabled_fast()) {
|
if (class_idx == 6 && tiny_c6_inline_slots_enabled_fast()) {
|
||||||
|
|||||||
153
core/box/tiny_inline_slots_overflow_stats_box.c
Normal file
153
core/box/tiny_inline_slots_overflow_stats_box.c
Normal file
@ -0,0 +1,153 @@
|
|||||||
|
// tiny_inline_slots_overflow_stats_box.c - Phase 87: Inline Slots Overflow Telemetry
|
||||||
|
//
|
||||||
|
// Measures how often inline slots rings overflow and fallback to unified_cache/legacy paths.
|
||||||
|
|
||||||
|
#include "tiny_inline_slots_overflow_stats_box.h"
|
||||||
|
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <stdatomic.h>
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Global State
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
TinyInlineSlotsOverflowStats g_inline_slots_overflow_stats = {
|
||||||
|
.c3_push_full = 0,
|
||||||
|
.c4_push_full = 0,
|
||||||
|
.c5_push_full = 0,
|
||||||
|
.c6_push_full = 0,
|
||||||
|
.c3_pop_empty = 0,
|
||||||
|
.c4_pop_empty = 0,
|
||||||
|
.c5_pop_empty = 0,
|
||||||
|
.c6_pop_empty = 0,
|
||||||
|
.overflow_to_unified_cache = 0,
|
||||||
|
.overflow_to_legacy = 0,
|
||||||
|
};
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Refresh from ENV (called by bench_profile)
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
void tiny_inline_slots_overflow_refresh_from_env(void) {
|
||||||
|
// Placeholder for future ENV gating if needed
|
||||||
|
// Currently always enabled in observation builds (controlled by compile flag)
|
||||||
|
}
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Reporting
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
void tiny_inline_slots_overflow_report_stats(void) {
|
||||||
|
// Phase 87b: Legacy fallback counter
|
||||||
|
uint64_t legacy_fallback_calls = atomic_load(&g_inline_slots_overflow_stats.legacy_fallback_calls);
|
||||||
|
|
||||||
|
// Total push attempts (all classes)
|
||||||
|
uint64_t c3_push_total = atomic_load(&g_inline_slots_overflow_stats.c3_push_total);
|
||||||
|
uint64_t c4_push_total = atomic_load(&g_inline_slots_overflow_stats.c4_push_total);
|
||||||
|
uint64_t c5_push_total = atomic_load(&g_inline_slots_overflow_stats.c5_push_total);
|
||||||
|
uint64_t c6_push_total = atomic_load(&g_inline_slots_overflow_stats.c6_push_total);
|
||||||
|
|
||||||
|
// Total pop attempts (all classes)
|
||||||
|
uint64_t c3_pop_total = atomic_load(&g_inline_slots_overflow_stats.c3_pop_total);
|
||||||
|
uint64_t c4_pop_total = atomic_load(&g_inline_slots_overflow_stats.c4_pop_total);
|
||||||
|
uint64_t c5_pop_total = atomic_load(&g_inline_slots_overflow_stats.c5_pop_total);
|
||||||
|
uint64_t c6_pop_total = atomic_load(&g_inline_slots_overflow_stats.c6_pop_total);
|
||||||
|
|
||||||
|
// Overflow counts (ring full/empty)
|
||||||
|
uint64_t c3_push_full = atomic_load(&g_inline_slots_overflow_stats.c3_push_full);
|
||||||
|
uint64_t c4_push_full = atomic_load(&g_inline_slots_overflow_stats.c4_push_full);
|
||||||
|
uint64_t c5_push_full = atomic_load(&g_inline_slots_overflow_stats.c5_push_full);
|
||||||
|
uint64_t c6_push_full = atomic_load(&g_inline_slots_overflow_stats.c6_push_full);
|
||||||
|
|
||||||
|
uint64_t c3_pop_empty = atomic_load(&g_inline_slots_overflow_stats.c3_pop_empty);
|
||||||
|
uint64_t c4_pop_empty = atomic_load(&g_inline_slots_overflow_stats.c4_pop_empty);
|
||||||
|
uint64_t c5_pop_empty = atomic_load(&g_inline_slots_overflow_stats.c5_pop_empty);
|
||||||
|
uint64_t c6_pop_empty = atomic_load(&g_inline_slots_overflow_stats.c6_pop_empty);
|
||||||
|
|
||||||
|
uint64_t overflow_to_uc = atomic_load(&g_inline_slots_overflow_stats.overflow_to_unified_cache);
|
||||||
|
uint64_t overflow_to_legacy = atomic_load(&g_inline_slots_overflow_stats.overflow_to_legacy);
|
||||||
|
|
||||||
|
// Totals
|
||||||
|
uint64_t total_push_total = c3_push_total + c4_push_total + c5_push_total + c6_push_total;
|
||||||
|
uint64_t total_pop_total = c3_pop_total + c4_pop_total + c5_pop_total + c6_pop_total;
|
||||||
|
uint64_t total_push_full = c3_push_full + c4_push_full + c5_push_full + c6_push_full;
|
||||||
|
uint64_t total_pop_empty = c3_pop_empty + c4_pop_empty + c5_pop_empty + c6_pop_empty;
|
||||||
|
uint64_t total_overflow = overflow_to_uc + overflow_to_legacy;
|
||||||
|
|
||||||
|
fprintf(stderr, "\n");
|
||||||
|
fprintf(stderr, "=== PHASE 87: INLINE SLOTS OVERFLOW STATS ===\n");
|
||||||
|
fprintf(stderr, "\n");
|
||||||
|
fprintf(stderr, "PUSH TOTAL (Free Path Attempts - Verify inline slots called):\n");
|
||||||
|
fprintf(stderr, " C3: %10llu\n", (unsigned long long)c3_push_total);
|
||||||
|
fprintf(stderr, " C4: %10llu\n", (unsigned long long)c4_push_total);
|
||||||
|
fprintf(stderr, " C5: %10llu\n", (unsigned long long)c5_push_total);
|
||||||
|
fprintf(stderr, " C6: %10llu\n", (unsigned long long)c6_push_total);
|
||||||
|
fprintf(stderr, " TOTAL: %6llu\n", (unsigned long long)total_push_total);
|
||||||
|
fprintf(stderr, "\n");
|
||||||
|
fprintf(stderr, "PUSH FULL (Free Path Ring Overflow):\n");
|
||||||
|
fprintf(stderr, " C3: %10llu", (unsigned long long)c3_push_full);
|
||||||
|
if (c3_push_total > 0) fprintf(stderr, " (%.2f%%)\n", 100.0 * c3_push_full / c3_push_total);
|
||||||
|
else fprintf(stderr, " (N/A)\n");
|
||||||
|
fprintf(stderr, " C4: %10llu", (unsigned long long)c4_push_full);
|
||||||
|
if (c4_push_total > 0) fprintf(stderr, " (%.2f%%)\n", 100.0 * c4_push_full / c4_push_total);
|
||||||
|
else fprintf(stderr, " (N/A)\n");
|
||||||
|
fprintf(stderr, " C5: %10llu", (unsigned long long)c5_push_full);
|
||||||
|
if (c5_push_total > 0) fprintf(stderr, " (%.2f%%)\n", 100.0 * c5_push_full / c5_push_total);
|
||||||
|
else fprintf(stderr, " (N/A)\n");
|
||||||
|
fprintf(stderr, " C6: %10llu", (unsigned long long)c6_push_full);
|
||||||
|
if (c6_push_total > 0) fprintf(stderr, " (%.2f%%)\n", 100.0 * c6_push_full / c6_push_total);
|
||||||
|
else fprintf(stderr, " (N/A)\n");
|
||||||
|
fprintf(stderr, " TOTAL: %6llu", (unsigned long long)total_push_full);
|
||||||
|
if (total_push_total > 0) fprintf(stderr, " (%.2f%%)\n", 100.0 * total_push_full / total_push_total);
|
||||||
|
else fprintf(stderr, " (N/A)\n");
|
||||||
|
fprintf(stderr, "\n");
|
||||||
|
fprintf(stderr, "POP TOTAL (Alloc Path Attempts - Verify inline slots called):\n");
|
||||||
|
fprintf(stderr, " C3: %10llu\n", (unsigned long long)c3_pop_total);
|
||||||
|
fprintf(stderr, " C4: %10llu\n", (unsigned long long)c4_pop_total);
|
||||||
|
fprintf(stderr, " C5: %10llu\n", (unsigned long long)c5_pop_total);
|
||||||
|
fprintf(stderr, " C6: %10llu\n", (unsigned long long)c6_pop_total);
|
||||||
|
fprintf(stderr, " TOTAL: %6llu\n", (unsigned long long)total_pop_total);
|
||||||
|
fprintf(stderr, "\n");
|
||||||
|
fprintf(stderr, "POP EMPTY (Alloc Path Ring Underflow):\n");
|
||||||
|
fprintf(stderr, " C3: %10llu", (unsigned long long)c3_pop_empty);
|
||||||
|
if (c3_pop_total > 0) fprintf(stderr, " (%.2f%%)\n", 100.0 * c3_pop_empty / c3_pop_total);
|
||||||
|
else fprintf(stderr, " (N/A)\n");
|
||||||
|
fprintf(stderr, " C4: %10llu", (unsigned long long)c4_pop_empty);
|
||||||
|
if (c4_pop_total > 0) fprintf(stderr, " (%.2f%%)\n", 100.0 * c4_pop_empty / c4_pop_total);
|
||||||
|
else fprintf(stderr, " (N/A)\n");
|
||||||
|
fprintf(stderr, " C5: %10llu", (unsigned long long)c5_pop_empty);
|
||||||
|
if (c5_pop_total > 0) fprintf(stderr, " (%.2f%%)\n", 100.0 * c5_pop_empty / c5_pop_total);
|
||||||
|
else fprintf(stderr, " (N/A)\n");
|
||||||
|
fprintf(stderr, " C6: %10llu", (unsigned long long)c6_pop_empty);
|
||||||
|
if (c6_pop_total > 0) fprintf(stderr, " (%.2f%%)\n", 100.0 * c6_pop_empty / c6_pop_total);
|
||||||
|
else fprintf(stderr, " (N/A)\n");
|
||||||
|
fprintf(stderr, " TOTAL: %6llu", (unsigned long long)total_pop_empty);
|
||||||
|
if (total_pop_total > 0) fprintf(stderr, " (%.2f%%)\n", 100.0 * total_pop_empty / total_pop_total);
|
||||||
|
else fprintf(stderr, " (N/A)\n");
|
||||||
|
fprintf(stderr, "\n");
|
||||||
|
fprintf(stderr, "OVERFLOW DESTINATIONS:\n");
|
||||||
|
fprintf(stderr, " Unified Cache: %10llu\n", (unsigned long long)overflow_to_uc);
|
||||||
|
fprintf(stderr, " Legacy Fallback: %7llu\n", (unsigned long long)overflow_to_legacy);
|
||||||
|
fprintf(stderr, " TOTAL: %14llu\n", (unsigned long long)total_overflow);
|
||||||
|
fprintf(stderr, "\n");
|
||||||
|
fprintf(stderr, "=== PHASE 87b: CALL PATH VERIFICATION ===\n");
|
||||||
|
fprintf(stderr, "\n");
|
||||||
|
fprintf(stderr, "LEGACY FALLBACK CALLS (Free path route verification):\n");
|
||||||
|
fprintf(stderr, " tiny_legacy_fallback_free_base_with_env: %llu\n", (unsigned long long)legacy_fallback_calls);
|
||||||
|
fprintf(stderr, "\n");
|
||||||
|
fprintf(stderr, "JUDGMENT:\n");
|
||||||
|
if (legacy_fallback_calls == 0) {
|
||||||
|
fprintf(stderr, " ⚠️ [A] LEGACY fallback NOT used → Alternate free path (not expected)\n");
|
||||||
|
} else if (total_push_total == 0 && total_pop_total == 0) {
|
||||||
|
fprintf(stderr, " ⚠️ [B] LEGACY used, but C4/C5/C6 INLINE SLOTS DISABLED → enable=OFF\n");
|
||||||
|
} else if (total_push_total > 0 || total_pop_total > 0) {
|
||||||
|
fprintf(stderr, " ✓ [C] LEGACY used AND C4/C5/C6 INLINE SLOTS ACTIVE → Ready for Phase 88/89\n");
|
||||||
|
fprintf(stderr, " Push activity: %llu, Pop activity: %llu\n",
|
||||||
|
(unsigned long long)total_push_total, (unsigned long long)total_pop_total);
|
||||||
|
}
|
||||||
|
fprintf(stderr, "\n");
|
||||||
|
fprintf(stderr, "===========================================\n");
|
||||||
|
fprintf(stderr, "\n");
|
||||||
|
fflush(stderr);
|
||||||
|
}
|
||||||
155
core/box/tiny_inline_slots_overflow_stats_box.h
Normal file
155
core/box/tiny_inline_slots_overflow_stats_box.h
Normal file
@ -0,0 +1,155 @@
|
|||||||
|
// tiny_inline_slots_overflow_stats_box.h - Phase 87: Inline Slots Overflow Telemetry
|
||||||
|
//
|
||||||
|
// Purpose: Measure overflow frequency for C3/C4/C5/C6 inline slots to determine
|
||||||
|
// if batch drain (Phase 88) is worth implementing.
|
||||||
|
//
|
||||||
|
// Metrics:
|
||||||
|
// - push_full: When free path TLS ring is FULL, must fallback to unified_cache/legacy
|
||||||
|
// - pop_empty: When alloc path TLS ring is EMPTY, must fetch from unified_cache/SuperSlab
|
||||||
|
// - overflow_to_uc: Fallback to unified_cache (before legacy path)
|
||||||
|
// - overflow_to_legacy: Final fallback when unified_cache also full
|
||||||
|
//
|
||||||
|
// Usage:
|
||||||
|
// - Compile-time: Only enabled in observation builds (not RELEASE) unless explicitly enabled.
|
||||||
|
// - Call tiny_inline_slots_overflow_report_stats() on exit to print summary
|
||||||
|
//
|
||||||
|
// Compile gate:
|
||||||
|
// - HAKMEM_INLINE_SLOTS_OVERFLOW_STATS_COMPILED=0/1 (default 0)
|
||||||
|
|
||||||
|
#ifndef HAK_BOX_TINY_INLINE_SLOTS_OVERFLOW_STATS_BOX_H
|
||||||
|
#define HAK_BOX_TINY_INLINE_SLOTS_OVERFLOW_STATS_BOX_H
|
||||||
|
|
||||||
|
#include <stdint.h>
|
||||||
|
#include <stdatomic.h>
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Global Counters (per-class overflow tracking)
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
typedef struct {
|
||||||
|
// C3/C4/C5/C6 push attempts (free path: total attempts)
|
||||||
|
_Atomic uint64_t c3_push_total;
|
||||||
|
_Atomic uint64_t c4_push_total;
|
||||||
|
_Atomic uint64_t c5_push_total;
|
||||||
|
_Atomic uint64_t c6_push_total;
|
||||||
|
|
||||||
|
// C3/C4/C5/C6 push_full (free path: TLS ring FULL)
|
||||||
|
_Atomic uint64_t c3_push_full;
|
||||||
|
_Atomic uint64_t c4_push_full;
|
||||||
|
_Atomic uint64_t c5_push_full;
|
||||||
|
_Atomic uint64_t c6_push_full;
|
||||||
|
|
||||||
|
// C3/C4/C5/C6 pop attempts (alloc path: total attempts)
|
||||||
|
_Atomic uint64_t c3_pop_total;
|
||||||
|
_Atomic uint64_t c4_pop_total;
|
||||||
|
_Atomic uint64_t c5_pop_total;
|
||||||
|
_Atomic uint64_t c6_pop_total;
|
||||||
|
|
||||||
|
// C3/C4/C5/C6 pop_empty (alloc path: TLS ring EMPTY)
|
||||||
|
_Atomic uint64_t c3_pop_empty;
|
||||||
|
_Atomic uint64_t c4_pop_empty;
|
||||||
|
_Atomic uint64_t c5_pop_empty;
|
||||||
|
_Atomic uint64_t c6_pop_empty;
|
||||||
|
|
||||||
|
// Overflow destinations
|
||||||
|
_Atomic uint64_t overflow_to_unified_cache; // fallback when inline ring full
|
||||||
|
_Atomic uint64_t overflow_to_legacy; // fallback when unified_cache also full
|
||||||
|
|
||||||
|
// Phase 87b: Legacy fallback counter (verify actual call paths)
|
||||||
|
_Atomic uint64_t legacy_fallback_calls; // total calls to tiny_legacy_fallback_free_base_with_env
|
||||||
|
} TinyInlineSlotsOverflowStats;
|
||||||
|
|
||||||
|
extern TinyInlineSlotsOverflowStats g_inline_slots_overflow_stats;
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Refresh from ENV (at init time)
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
void tiny_inline_slots_overflow_refresh_from_env(void);
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Reporting
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
void tiny_inline_slots_overflow_report_stats(void);
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Fast-path APIs (inlined, minimal overhead when disabled)
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
__attribute__((always_inline))
|
||||||
|
static inline int tiny_inline_slots_overflow_enabled(void) {
|
||||||
|
// Compile-time control (header-only hot-path helpers).
|
||||||
|
// Default is OFF in release; enable for OBSERVE/research builds as needed.
|
||||||
|
#if !HAKMEM_BUILD_RELEASE || HAKMEM_INLINE_SLOTS_OVERFLOW_STATS_COMPILED
|
||||||
|
return 1;
|
||||||
|
#else
|
||||||
|
return 0;
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
__attribute__((always_inline))
|
||||||
|
static inline void tiny_inline_slots_count_push_total(int class_idx) {
|
||||||
|
if (__builtin_expect(!tiny_inline_slots_overflow_enabled(), 1)) return;
|
||||||
|
|
||||||
|
switch (class_idx) {
|
||||||
|
case 3: atomic_fetch_add(&g_inline_slots_overflow_stats.c3_push_total, 1); break;
|
||||||
|
case 4: atomic_fetch_add(&g_inline_slots_overflow_stats.c4_push_total, 1); break;
|
||||||
|
case 5: atomic_fetch_add(&g_inline_slots_overflow_stats.c5_push_total, 1); break;
|
||||||
|
case 6: atomic_fetch_add(&g_inline_slots_overflow_stats.c6_push_total, 1); break;
|
||||||
|
default: break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
__attribute__((always_inline))
|
||||||
|
static inline void tiny_inline_slots_count_push_full(int class_idx) {
|
||||||
|
if (__builtin_expect(!tiny_inline_slots_overflow_enabled(), 1)) return;
|
||||||
|
|
||||||
|
switch (class_idx) {
|
||||||
|
case 3: atomic_fetch_add(&g_inline_slots_overflow_stats.c3_push_full, 1); break;
|
||||||
|
case 4: atomic_fetch_add(&g_inline_slots_overflow_stats.c4_push_full, 1); break;
|
||||||
|
case 5: atomic_fetch_add(&g_inline_slots_overflow_stats.c5_push_full, 1); break;
|
||||||
|
case 6: atomic_fetch_add(&g_inline_slots_overflow_stats.c6_push_full, 1); break;
|
||||||
|
default: break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
__attribute__((always_inline))
|
||||||
|
static inline void tiny_inline_slots_count_pop_total(int class_idx) {
|
||||||
|
if (__builtin_expect(!tiny_inline_slots_overflow_enabled(), 1)) return;
|
||||||
|
|
||||||
|
switch (class_idx) {
|
||||||
|
case 3: atomic_fetch_add(&g_inline_slots_overflow_stats.c3_pop_total, 1); break;
|
||||||
|
case 4: atomic_fetch_add(&g_inline_slots_overflow_stats.c4_pop_total, 1); break;
|
||||||
|
case 5: atomic_fetch_add(&g_inline_slots_overflow_stats.c5_pop_total, 1); break;
|
||||||
|
case 6: atomic_fetch_add(&g_inline_slots_overflow_stats.c6_pop_total, 1); break;
|
||||||
|
default: break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
__attribute__((always_inline))
|
||||||
|
static inline void tiny_inline_slots_count_pop_empty(int class_idx) {
|
||||||
|
if (__builtin_expect(!tiny_inline_slots_overflow_enabled(), 1)) return;
|
||||||
|
|
||||||
|
switch (class_idx) {
|
||||||
|
case 3: atomic_fetch_add(&g_inline_slots_overflow_stats.c3_pop_empty, 1); break;
|
||||||
|
case 4: atomic_fetch_add(&g_inline_slots_overflow_stats.c4_pop_empty, 1); break;
|
||||||
|
case 5: atomic_fetch_add(&g_inline_slots_overflow_stats.c5_pop_empty, 1); break;
|
||||||
|
case 6: atomic_fetch_add(&g_inline_slots_overflow_stats.c6_pop_empty, 1); break;
|
||||||
|
default: break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
__attribute__((always_inline))
|
||||||
|
static inline void tiny_inline_slots_count_overflow_to_uc(void) {
|
||||||
|
if (__builtin_expect(!tiny_inline_slots_overflow_enabled(), 1)) return;
|
||||||
|
atomic_fetch_add(&g_inline_slots_overflow_stats.overflow_to_unified_cache, 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
__attribute__((always_inline))
|
||||||
|
static inline void tiny_inline_slots_count_overflow_to_legacy(void) {
|
||||||
|
if (__builtin_expect(!tiny_inline_slots_overflow_enabled(), 1)) return;
|
||||||
|
atomic_fetch_add(&g_inline_slots_overflow_stats.overflow_to_legacy, 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif // HAK_BOX_TINY_INLINE_SLOTS_OVERFLOW_STATS_BOX_H
|
||||||
@ -25,6 +25,9 @@
|
|||||||
#include "tiny_inline_slots_fixed_mode_box.h" // Phase 78-1: Optional fixed-mode gating
|
#include "tiny_inline_slots_fixed_mode_box.h" // Phase 78-1: Optional fixed-mode gating
|
||||||
#include "tiny_inline_slots_switch_dispatch_box.h" // Phase 80-1: Switch dispatch for C4/C5/C6
|
#include "tiny_inline_slots_switch_dispatch_box.h" // Phase 80-1: Switch dispatch for C4/C5/C6
|
||||||
#include "tiny_inline_slots_switch_dispatch_fixed_box.h" // Phase 83-1: Switch dispatch fixed mode
|
#include "tiny_inline_slots_switch_dispatch_fixed_box.h" // Phase 83-1: Switch dispatch fixed mode
|
||||||
|
#include "tiny_inline_slots_overflow_stats_box.h" // Phase 87b: Legacy fallback counter
|
||||||
|
#include "tiny_c6_inline_slots_ifl_env_box.h" // Phase 91: C6 intrusive LIFO inline slots ENV gate
|
||||||
|
#include "tiny_c6_inline_slots_ifl_tls_box.h" // Phase 91: C6 intrusive LIFO inline slots TLS state
|
||||||
|
|
||||||
// Purpose: Encapsulate legacy free logic (shared by multiple paths)
|
// Purpose: Encapsulate legacy free logic (shared by multiple paths)
|
||||||
// Called by: malloc_tiny_fast.h (free path) + tiny_c6_ultra_free_box.c (C6 fallback)
|
// Called by: malloc_tiny_fast.h (free path) + tiny_c6_ultra_free_box.c (C6 fallback)
|
||||||
@ -36,6 +39,9 @@
|
|||||||
//
|
//
|
||||||
__attribute__((always_inline))
|
__attribute__((always_inline))
|
||||||
static inline void tiny_legacy_fallback_free_base_with_env(void* base, uint32_t class_idx, const HakmemEnvSnapshot* env) {
|
static inline void tiny_legacy_fallback_free_base_with_env(void* base, uint32_t class_idx, const HakmemEnvSnapshot* env) {
|
||||||
|
// Phase 87b: Count legacy fallback calls for verification
|
||||||
|
atomic_fetch_add(&g_inline_slots_overflow_stats.legacy_fallback_calls, 1);
|
||||||
|
|
||||||
// Phase 80-1: Switch dispatch for C4/C5/C6 (branch reduction optimization)
|
// Phase 80-1: Switch dispatch for C4/C5/C6 (branch reduction optimization)
|
||||||
// Phase 83-1: Per-op branch removed via fixed-mode caching
|
// Phase 83-1: Per-op branch removed via fixed-mode caching
|
||||||
// C2/C3 excluded (NO-GO from Phase 77-1/79-1)
|
// C2/C3 excluded (NO-GO from Phase 77-1/79-1)
|
||||||
@ -65,6 +71,17 @@ static inline void tiny_legacy_fallback_free_base_with_env(void* base, uint32_t
|
|||||||
}
|
}
|
||||||
break;
|
break;
|
||||||
case 6:
|
case 6:
|
||||||
|
// Phase 91: C6 Intrusive LIFO Inline Slots (check BEFORE FIFO)
|
||||||
|
if (tiny_c6_inline_slots_ifl_enabled_fast()) {
|
||||||
|
if (tiny_c6_inline_slots_ifl_push_fast(base)) {
|
||||||
|
FREE_PATH_STAT_INC(legacy_fallback);
|
||||||
|
if (__builtin_expect(free_path_stats_enabled(), 0)) {
|
||||||
|
g_free_path_stats.legacy_by_class[class_idx]++;
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// Phase 75-1: C6 Inline Slots (FIFO - fallback)
|
||||||
if (tiny_c6_inline_slots_enabled_fast()) {
|
if (tiny_c6_inline_slots_enabled_fast()) {
|
||||||
if (c6_inline_push(c6_inline_tls(), base)) {
|
if (c6_inline_push(c6_inline_tls(), base)) {
|
||||||
FREE_PATH_STAT_INC(legacy_fallback);
|
FREE_PATH_STAT_INC(legacy_fallback);
|
||||||
@ -126,6 +143,20 @@ static inline void tiny_legacy_fallback_free_base_with_env(void* base, uint32_t
|
|||||||
// FULL → fall through to C6/unified cache
|
// FULL → fall through to C6/unified cache
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Phase 91: C6 Intrusive LIFO Inline Slots early-exit (ENV gated)
|
||||||
|
// Try C6 IFL THIRD (before C6 FIFO and unified cache) for class 6
|
||||||
|
if (class_idx == 6 && tiny_c6_inline_slots_ifl_enabled_fast()) {
|
||||||
|
if (tiny_c6_inline_slots_ifl_push_fast(base)) {
|
||||||
|
// Success: pushed to C6 IFL
|
||||||
|
FREE_PATH_STAT_INC(legacy_fallback);
|
||||||
|
if (__builtin_expect(free_path_stats_enabled(), 0)) {
|
||||||
|
g_free_path_stats.legacy_by_class[class_idx]++;
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
// FULL → fall through to C6 FIFO
|
||||||
|
}
|
||||||
|
|
||||||
// Phase 75-1: C6 Inline Slots early-exit (ENV gated)
|
// Phase 75-1: C6 Inline Slots early-exit (ENV gated)
|
||||||
// Try C6 inline slots THIRD (before unified cache) for class 6
|
// Try C6 inline slots THIRD (before unified cache) for class 6
|
||||||
if (class_idx == 6 && tiny_c6_inline_slots_enabled_fast()) {
|
if (class_idx == 6 && tiny_c6_inline_slots_enabled_fast()) {
|
||||||
|
|||||||
@ -26,6 +26,7 @@
|
|||||||
#include "../box/tiny_c3_inline_slots_tls_box.h"
|
#include "../box/tiny_c3_inline_slots_tls_box.h"
|
||||||
#include "../box/tiny_c3_inline_slots_env_box.h"
|
#include "../box/tiny_c3_inline_slots_env_box.h"
|
||||||
#include "../box/tiny_inline_slots_fixed_mode_box.h"
|
#include "../box/tiny_inline_slots_fixed_mode_box.h"
|
||||||
|
#include "../box/tiny_inline_slots_overflow_stats_box.h"
|
||||||
|
|
||||||
// ============================================================================
|
// ============================================================================
|
||||||
// C3 Inline Slots: Fast-Path Push/Pop (Always-Inline)
|
// C3 Inline Slots: Fast-Path Push/Pop (Always-Inline)
|
||||||
@ -42,8 +43,11 @@ static inline TinyC3InlineSlots* c3_inline_tls(void) {
|
|||||||
// Returns: 1 if success, 0 if full (caller must fallback to unified_cache)
|
// Returns: 1 if success, 0 if full (caller must fallback to unified_cache)
|
||||||
__attribute__((always_inline))
|
__attribute__((always_inline))
|
||||||
static inline int c3_inline_push(TinyC3InlineSlots* slots, void* ptr) {
|
static inline int c3_inline_push(TinyC3InlineSlots* slots, void* ptr) {
|
||||||
|
tiny_inline_slots_count_push_total(3); // Phase 87: Telemetry (all attempts)
|
||||||
|
|
||||||
// Check if ring is full
|
// Check if ring is full
|
||||||
if (__builtin_expect(c3_inline_full(slots), 0)) {
|
if (__builtin_expect(c3_inline_full(slots), 0)) {
|
||||||
|
tiny_inline_slots_count_push_full(3); // Phase 87: Telemetry (overflow)
|
||||||
return 0; // Full, caller must use unified_cache
|
return 0; // Full, caller must use unified_cache
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -58,8 +62,11 @@ static inline int c3_inline_push(TinyC3InlineSlots* slots, void* ptr) {
|
|||||||
// Returns: non-NULL if success, NULL if empty (caller must fallback to unified_cache)
|
// Returns: non-NULL if success, NULL if empty (caller must fallback to unified_cache)
|
||||||
__attribute__((always_inline))
|
__attribute__((always_inline))
|
||||||
static inline void* c3_inline_pop(TinyC3InlineSlots* slots) {
|
static inline void* c3_inline_pop(TinyC3InlineSlots* slots) {
|
||||||
|
tiny_inline_slots_count_pop_total(3); // Phase 87: Telemetry (all attempts)
|
||||||
|
|
||||||
// Check if ring is empty
|
// Check if ring is empty
|
||||||
if (__builtin_expect(c3_inline_empty(slots), 0)) {
|
if (__builtin_expect(c3_inline_empty(slots), 0)) {
|
||||||
|
tiny_inline_slots_count_pop_empty(3); // Phase 87: Telemetry (underflow)
|
||||||
return NULL; // Empty, caller must use unified_cache
|
return NULL; // Empty, caller must use unified_cache
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@ -25,6 +25,7 @@
|
|||||||
#include "../box/tiny_c4_inline_slots_env_box.h"
|
#include "../box/tiny_c4_inline_slots_env_box.h"
|
||||||
#include "../box/tiny_c4_inline_slots_tls_box.h"
|
#include "../box/tiny_c4_inline_slots_tls_box.h"
|
||||||
#include "../box/tiny_inline_slots_fixed_mode_box.h"
|
#include "../box/tiny_inline_slots_fixed_mode_box.h"
|
||||||
|
#include "../box/tiny_inline_slots_overflow_stats_box.h"
|
||||||
|
|
||||||
// ============================================================================
|
// ============================================================================
|
||||||
// Fast-Path API (always_inline for zero branch overhead)
|
// Fast-Path API (always_inline for zero branch overhead)
|
||||||
@ -35,8 +36,11 @@
|
|||||||
// Precondition: ptr is valid BASE pointer for C4 class
|
// Precondition: ptr is valid BASE pointer for C4 class
|
||||||
__attribute__((always_inline))
|
__attribute__((always_inline))
|
||||||
static inline int c4_inline_push(TinyC4InlineSlots* slots, void* ptr) {
|
static inline int c4_inline_push(TinyC4InlineSlots* slots, void* ptr) {
|
||||||
|
tiny_inline_slots_count_push_total(4); // Phase 87: Telemetry (all attempts)
|
||||||
|
|
||||||
// Full check (single branch, likely taken in steady state)
|
// Full check (single branch, likely taken in steady state)
|
||||||
if (__builtin_expect(c4_inline_full(slots), 0)) {
|
if (__builtin_expect(c4_inline_full(slots), 0)) {
|
||||||
|
tiny_inline_slots_count_push_full(4); // Phase 87: Telemetry (overflow)
|
||||||
return 0; // Full, caller must fallback
|
return 0; // Full, caller must fallback
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -52,8 +56,11 @@ static inline int c4_inline_push(TinyC4InlineSlots* slots, void* ptr) {
|
|||||||
// Precondition: slots is initialized and enabled
|
// Precondition: slots is initialized and enabled
|
||||||
__attribute__((always_inline))
|
__attribute__((always_inline))
|
||||||
static inline void* c4_inline_pop(TinyC4InlineSlots* slots) {
|
static inline void* c4_inline_pop(TinyC4InlineSlots* slots) {
|
||||||
|
tiny_inline_slots_count_pop_total(4); // Phase 87: Telemetry (all attempts)
|
||||||
|
|
||||||
// Empty check (single branch, likely NOT taken in steady state)
|
// Empty check (single branch, likely NOT taken in steady state)
|
||||||
if (__builtin_expect(c4_inline_empty(slots), 0)) {
|
if (__builtin_expect(c4_inline_empty(slots), 0)) {
|
||||||
|
tiny_inline_slots_count_pop_empty(4); // Phase 87: Telemetry (underflow)
|
||||||
return NULL; // Empty, caller must fallback
|
return NULL; // Empty, caller must fallback
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@ -25,6 +25,7 @@
|
|||||||
#include "../box/tiny_c5_inline_slots_env_box.h"
|
#include "../box/tiny_c5_inline_slots_env_box.h"
|
||||||
#include "../box/tiny_c5_inline_slots_tls_box.h"
|
#include "../box/tiny_c5_inline_slots_tls_box.h"
|
||||||
#include "../box/tiny_inline_slots_fixed_mode_box.h"
|
#include "../box/tiny_inline_slots_fixed_mode_box.h"
|
||||||
|
#include "../box/tiny_inline_slots_overflow_stats_box.h"
|
||||||
|
|
||||||
// ============================================================================
|
// ============================================================================
|
||||||
// Fast-Path API (always_inline for zero branch overhead)
|
// Fast-Path API (always_inline for zero branch overhead)
|
||||||
@ -35,8 +36,11 @@
|
|||||||
// Precondition: ptr is valid BASE pointer for C5 class
|
// Precondition: ptr is valid BASE pointer for C5 class
|
||||||
__attribute__((always_inline))
|
__attribute__((always_inline))
|
||||||
static inline int c5_inline_push(TinyC5InlineSlots* slots, void* ptr) {
|
static inline int c5_inline_push(TinyC5InlineSlots* slots, void* ptr) {
|
||||||
|
tiny_inline_slots_count_push_total(5); // Phase 87: Telemetry (all attempts)
|
||||||
|
|
||||||
// Full check (single branch, likely taken in steady state)
|
// Full check (single branch, likely taken in steady state)
|
||||||
if (__builtin_expect(c5_inline_full(slots), 0)) {
|
if (__builtin_expect(c5_inline_full(slots), 0)) {
|
||||||
|
tiny_inline_slots_count_push_full(5); // Phase 87: Telemetry (overflow)
|
||||||
return 0; // Full, caller must fallback
|
return 0; // Full, caller must fallback
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -52,8 +56,11 @@ static inline int c5_inline_push(TinyC5InlineSlots* slots, void* ptr) {
|
|||||||
// Precondition: slots is initialized and enabled
|
// Precondition: slots is initialized and enabled
|
||||||
__attribute__((always_inline))
|
__attribute__((always_inline))
|
||||||
static inline void* c5_inline_pop(TinyC5InlineSlots* slots) {
|
static inline void* c5_inline_pop(TinyC5InlineSlots* slots) {
|
||||||
|
tiny_inline_slots_count_pop_total(5); // Phase 87: Telemetry (all attempts)
|
||||||
|
|
||||||
// Empty check (single branch, likely NOT taken in steady state)
|
// Empty check (single branch, likely NOT taken in steady state)
|
||||||
if (__builtin_expect(c5_inline_empty(slots), 0)) {
|
if (__builtin_expect(c5_inline_empty(slots), 0)) {
|
||||||
|
tiny_inline_slots_count_pop_empty(5); // Phase 87: Telemetry (underflow)
|
||||||
return NULL; // Empty, caller must fallback
|
return NULL; // Empty, caller must fallback
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@ -25,6 +25,7 @@
|
|||||||
#include "../box/tiny_c6_inline_slots_env_box.h"
|
#include "../box/tiny_c6_inline_slots_env_box.h"
|
||||||
#include "../box/tiny_c6_inline_slots_tls_box.h"
|
#include "../box/tiny_c6_inline_slots_tls_box.h"
|
||||||
#include "../box/tiny_inline_slots_fixed_mode_box.h"
|
#include "../box/tiny_inline_slots_fixed_mode_box.h"
|
||||||
|
#include "../box/tiny_inline_slots_overflow_stats_box.h"
|
||||||
|
|
||||||
// ============================================================================
|
// ============================================================================
|
||||||
// Fast-Path API (always_inline for zero branch overhead)
|
// Fast-Path API (always_inline for zero branch overhead)
|
||||||
@ -35,8 +36,11 @@
|
|||||||
// Precondition: ptr is valid BASE pointer for C6 class
|
// Precondition: ptr is valid BASE pointer for C6 class
|
||||||
__attribute__((always_inline))
|
__attribute__((always_inline))
|
||||||
static inline int c6_inline_push(TinyC6InlineSlots* slots, void* ptr) {
|
static inline int c6_inline_push(TinyC6InlineSlots* slots, void* ptr) {
|
||||||
|
tiny_inline_slots_count_push_total(6); // Phase 87: Telemetry (all attempts)
|
||||||
|
|
||||||
// Full check (single branch, likely taken in steady state)
|
// Full check (single branch, likely taken in steady state)
|
||||||
if (__builtin_expect(c6_inline_full(slots), 0)) {
|
if (__builtin_expect(c6_inline_full(slots), 0)) {
|
||||||
|
tiny_inline_slots_count_push_full(6); // Phase 87: Telemetry (overflow)
|
||||||
return 0; // Full, caller must fallback
|
return 0; // Full, caller must fallback
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -52,8 +56,11 @@ static inline int c6_inline_push(TinyC6InlineSlots* slots, void* ptr) {
|
|||||||
// Precondition: slots is initialized and enabled
|
// Precondition: slots is initialized and enabled
|
||||||
__attribute__((always_inline))
|
__attribute__((always_inline))
|
||||||
static inline void* c6_inline_pop(TinyC6InlineSlots* slots) {
|
static inline void* c6_inline_pop(TinyC6InlineSlots* slots) {
|
||||||
|
tiny_inline_slots_count_pop_total(6); // Phase 87: Telemetry (all attempts)
|
||||||
|
|
||||||
// Empty check (single branch, likely NOT taken in steady state)
|
// Empty check (single branch, likely NOT taken in steady state)
|
||||||
if (__builtin_expect(c6_inline_empty(slots), 0)) {
|
if (__builtin_expect(c6_inline_empty(slots), 0)) {
|
||||||
|
tiny_inline_slots_count_pop_empty(6); // Phase 87: Telemetry (underflow)
|
||||||
return NULL; // Empty, caller must fallback
|
return NULL; // Empty, caller must fallback
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@ -382,6 +382,19 @@
|
|||||||
# define HAKMEM_UNIFIED_CACHE_STATS_COMPILED 0
|
# define HAKMEM_UNIFIED_CACHE_STATS_COMPILED 0
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
|
// ------------------------------------------------------------
|
||||||
|
// Phase 87: Inline Slots Overflow/Traffic Telemetry (Compile gate)
|
||||||
|
// ------------------------------------------------------------
|
||||||
|
// Inline Slots Overflow Stats: Compile gate (default OFF = compile-out)
|
||||||
|
// Set to 1 for OBSERVE/research builds that need:
|
||||||
|
// - per-class push/pop totals (to prove the path is actually exercised)
|
||||||
|
// - overflow/underflow counts (FULL/EMPTY)
|
||||||
|
//
|
||||||
|
// IMPORTANT: This must be a compile-time flag because the hot-path helpers are header-only.
|
||||||
|
#ifndef HAKMEM_INLINE_SLOTS_OVERFLOW_STATS_COMPILED
|
||||||
|
# define HAKMEM_INLINE_SLOTS_OVERFLOW_STATS_COMPILED 0
|
||||||
|
#endif
|
||||||
|
|
||||||
// ------------------------------------------------------------
|
// ------------------------------------------------------------
|
||||||
// Phase 29: Pool Hotbox v2 Stats Prune (Compile-out telemetry atomics)
|
// Phase 29: Pool Hotbox v2 Stats Prune (Compile-out telemetry atomics)
|
||||||
// ------------------------------------------------------------
|
// ------------------------------------------------------------
|
||||||
|
|||||||
101
core/tiny_c6_inline_slots_ifl.c
Normal file
101
core/tiny_c6_inline_slots_ifl.c
Normal file
@ -0,0 +1,101 @@
|
|||||||
|
// tiny_c6_inline_slots_ifl.c - Phase 91: C6 Intrusive LIFO Inline Slots Implementation
|
||||||
|
//
|
||||||
|
// Goal: TLS variable definition, ENV refresh, overflow handler
|
||||||
|
// Scope: Per-thread LIFO state, initialization, drain to unified_cache
|
||||||
|
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <stdio.h>
|
||||||
|
#include "box/tiny_c6_inline_slots_ifl_env_box.h"
|
||||||
|
#include "box/tiny_c6_inline_slots_ifl_tls_box.h"
|
||||||
|
#include "box/tiny_unified_lifo_box.h"
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Global State (set by refresh function)
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
uint8_t g_tiny_c6_inline_slots_ifl_enabled = 0;
|
||||||
|
uint8_t g_tiny_c6_inline_slots_ifl_strict = 0;
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// TLS Variable Definition
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
// TLS instance (one per thread)
|
||||||
|
// Zero-initialized by default (head=NULL, count=0, enabled=0)
|
||||||
|
__thread struct TinyC6InlineSlotsIFL g_tiny_c6_inline_slots_ifl = {
|
||||||
|
.head = NULL,
|
||||||
|
.count = 0,
|
||||||
|
.enabled = 0,
|
||||||
|
};
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// ENV Refresh (called from bench_profile.h::refresh_all_env_caches)
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
void tiny_c6_inline_slots_ifl_refresh_from_env(void) {
|
||||||
|
// 1. Read master ENV gate
|
||||||
|
const char* env_val = getenv("HAKMEM_TINY_C6_INLINE_SLOTS_IFL");
|
||||||
|
int requested = (env_val && *env_val && *env_val != '0') ? 1 : 0;
|
||||||
|
|
||||||
|
if (!requested) {
|
||||||
|
g_tiny_c6_inline_slots_ifl_enabled = 0;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// 2. Fail-fast: LARSON_FIX incompatible
|
||||||
|
// Intrusive LIFO uses next pointer in freed object header,
|
||||||
|
// cannot coexist with owner_tid validation in header
|
||||||
|
const char* larson_env = getenv("HAKMEM_TINY_LARSON_FIX");
|
||||||
|
int larson_fix_enabled = (larson_env && *larson_env && *larson_env != '0') ? 1 : 0;
|
||||||
|
|
||||||
|
if (larson_fix_enabled) {
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
fprintf(stderr, "[C6-IFL] FAIL-FAST: HAKMEM_TINY_LARSON_FIX=1 incompatible with intrusive LIFO, disabling\n");
|
||||||
|
fflush(stderr);
|
||||||
|
#endif
|
||||||
|
g_tiny_c6_inline_slots_ifl_enabled = 0;
|
||||||
|
g_tiny_c6_inline_slots_ifl_strict = 1;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// 3. Read strict mode (diagnostic, not enforced)
|
||||||
|
const char* strict_env = getenv("HAKMEM_TINY_C6_IFL_STRICT");
|
||||||
|
g_tiny_c6_inline_slots_ifl_strict = (strict_env && *strict_env && *strict_env != '0') ? 1 : 0;
|
||||||
|
|
||||||
|
// 4. Enable IFL for this thread
|
||||||
|
g_tiny_c6_inline_slots_ifl_enabled = 1;
|
||||||
|
g_tiny_c6_inline_slots_ifl.enabled = 1;
|
||||||
|
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
fprintf(stderr, "[C6-IFL] Initialized: enabled=1, strict=%d\n",
|
||||||
|
g_tiny_c6_inline_slots_ifl_strict);
|
||||||
|
fflush(stderr);
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
// ============================================================================
|
||||||
|
// Overflow Handler: Drain LIFO to Unified Cache
|
||||||
|
// ============================================================================
|
||||||
|
|
||||||
|
void tiny_c6_inline_slots_ifl_drain_to_unified(void) {
|
||||||
|
// Drain all entries from LIFO head to unified_cache
|
||||||
|
// Called when count > 128 (overflow condition)
|
||||||
|
|
||||||
|
while (g_tiny_c6_inline_slots_ifl.count > 0) {
|
||||||
|
void* ptr = tiny_c6_inline_slots_ifl_pop_fast();
|
||||||
|
if (ptr == NULL) {
|
||||||
|
break; // Should not happen if count tracking is correct
|
||||||
|
}
|
||||||
|
|
||||||
|
// Push to unified_cache LIFO for C6
|
||||||
|
int success = unified_cache_try_push_lifo(6, ptr);
|
||||||
|
if (!success) {
|
||||||
|
// Unified cache is full; this should be rare
|
||||||
|
// For now, we leak the pointer (FIXME: proper fallback)
|
||||||
|
#if !HAKMEM_BUILD_RELEASE
|
||||||
|
fprintf(stderr, "[C6-IFL-DRAIN] WARNING: unified_cache full, dropping pointer %p\n", ptr);
|
||||||
|
fflush(stderr);
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
@ -2,12 +2,15 @@
|
|||||||
|
|
||||||
目的: 「数%を詰める開発」で一番きつい **ベンチが再現しない問題**を潰す。
|
目的: 「数%を詰める開発」で一番きつい **ベンチが再現しない問題**を潰す。
|
||||||
|
|
||||||
|
補助: buildの使い分けは `docs/analysis/SSOT_BUILD_MODES.md` を正とする。
|
||||||
|
|
||||||
## 1) まず結論(よくある原因)
|
## 1) まず結論(よくある原因)
|
||||||
|
|
||||||
同じマシンでも、以下が変わると 5–15% は普通に動く。
|
同じマシンでも、以下が変わると 5–15% は普通に動く。
|
||||||
|
|
||||||
- **CPU power/thermal**(governor / EPP / turbo)
|
- **CPU power/thermal**(governor / EPP / turbo)
|
||||||
- **HAKMEM_PROFILE 未指定**(route が変わる)
|
- **HAKMEM_PROFILE 未指定**(route が変わる)
|
||||||
|
- **ベンチのサイズレンジ漏れ**(`HAKMEM_BENCH_MIN_SIZE/MAX_SIZE` で class 分布が変わる)
|
||||||
- **export 漏れ**(過去の ENV が残る)
|
- **export 漏れ**(過去の ENV が残る)
|
||||||
- **別バイナリ比較**(layout tax: text 配置が変わる)
|
- **別バイナリ比較**(layout tax: text 配置が変わる)
|
||||||
|
|
||||||
@ -18,6 +21,9 @@
|
|||||||
- `HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE` を明示
|
- `HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE` を明示
|
||||||
- `RUNS=10`(ノイズを平均化)
|
- `RUNS=10`(ノイズを平均化)
|
||||||
- `WS=400`(SSOT)
|
- `WS=400`(SSOT)
|
||||||
|
- サイズレンジは SSOT 側で固定(runner が強制):
|
||||||
|
- `HAKMEM_BENCH_MIN_SIZE=16`
|
||||||
|
- `HAKMEM_BENCH_MAX_SIZE=1040`
|
||||||
- 任意(切り分け用):
|
- 任意(切り分け用):
|
||||||
- `HAKMEM_BENCH_ENV_LOG=1`(CPU governor/EPP/freq をログ)
|
- `HAKMEM_BENCH_ENV_LOG=1`(CPU governor/EPP/freq をログ)
|
||||||
|
|
||||||
@ -33,6 +39,7 @@ allocator比較は layout tax が混ざるため **reference**。
|
|||||||
|
|
||||||
1. SSOT実行は必ず cleanenv:
|
1. SSOT実行は必ず cleanenv:
|
||||||
- `scripts/run_mixed_10_cleanenv.sh`
|
- `scripts/run_mixed_10_cleanenv.sh`
|
||||||
|
- `SSOT_MIN_SIZE/SSOT_MAX_SIZE` でレンジを明示的に上書きできる(export 漏れの影響を受けない)
|
||||||
2. 毎回、環境ログを残す:
|
2. 毎回、環境ログを残す:
|
||||||
- `HAKMEM_BENCH_ENV_LOG=1`
|
- `HAKMEM_BENCH_ENV_LOG=1`
|
||||||
3. 結果をファイル化(後から追える形):
|
3. 結果をファイル化(後から追える形):
|
||||||
|
|||||||
@ -11,36 +11,27 @@
|
|||||||
|
|
||||||
mimalloc との比較は **FAST build** で行う(Standard は fixed tax を含むため公平でない)。
|
mimalloc との比較は **FAST build** で行う(Standard は fixed tax を含むため公平でない)。
|
||||||
|
|
||||||
## Current snapshot(2025-12-18, Phase 69 PGO + WarmPool=16 — 現行 baseline)
|
## Current snapshot(2025-12-18, Phase 89 SSOT capture — 現行 baseline)
|
||||||
|
|
||||||
計測条件(再現の正):
|
**このスコアカードの「現行の正」は Phase 89 の SSOT capture**を基準にする:
|
||||||
- Mixed: `scripts/run_mixed_10_cleanenv.sh`(`ITERS=20000000 WS=400`)
|
- SSOT capture: `docs/analysis/PHASE89_SSOT_MEASUREMENT.md`(Git SHA: `e4c5f0535`)
|
||||||
- 10-run mean/median
|
- Mixed SSOT runner: `scripts/run_mixed_10_cleanenv.sh`(`ITERS=20000000 WS=400`)
|
||||||
- Git: master (Phase 68 PGO, seed/WS diversified profile)
|
- プロファイル: `HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE`
|
||||||
- **Baseline binary**: `bench_random_mixed_hakmem_minimal_pgo` (Phase 68 upgraded)
|
- SSOT を崩す最頻事故: `HAKMEM_PROFILE` 未指定 / `MIN_SIZE/MAX_SIZE` 漏れ(→経路が変わる)
|
||||||
- **Stability**: Phase 66: 3 iterations, +3.0% mean, variance <±1% | Phase 68: 10-run, +1.19% vs Phase 66 (GO)
|
|
||||||
|
|
||||||
Note:
|
### hakmem SSOT baselines(Phase 89)
|
||||||
- Phase 75 introduced C5/C6 inline slots and promoted them into presets. Phase 75 A/B results were recorded on the Standard binary (`./bench_random_mixed_hakmem`).
|
|
||||||
- FAST PGO SSOT baselines/ratios should only be updated after re-running A/B with `BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo`.
|
|
||||||
|
|
||||||
### hakmem Build Variants(同一バイナリレイアウト)
|
| Build | Mean (M ops/s) | Median (M ops/s) | 備考 |
|
||||||
|
|-------|----------------|------------------|------|
|
||||||
| Build | Mean (M ops/s) | Median (M ops/s) | vs mimalloc | 備考 |
|
| Standard | **51.36** | - | SSOT baseline(telemetryなし、最適化判断の正) |
|
||||||
|-------|----------------|------------------|-------------|------|
|
| FAST PGO minimal | **54.16** | - | SSOT ceiling(`bench_random_mixed_hakmem_minimal_pgo`)。Standard比 **+5.45%** |
|
||||||
| FAST v3 | 58.478 | 58.876 | 48.34% | 旧 baseline(Phase 59b rebase)。性能評価の正から昇格 → Phase 66 PGO へ |
|
| OBSERVE | 51.52 | - | 経路確認用(telemetry込み)。性能比較の正ではない |
|
||||||
| FAST v3 + PGO | 59.80 | 60.25 | 49.41% | Phase 47: NEUTRAL (+0.27% mean, +1.02% median, research box) |
|
|
||||||
| **FAST v3 + PGO (Phase 66)** | **60.89** | **61.35** | **50.32%** | **GO: +3.0% mean (3回検証済み、安定 <±1%)**。Phase 66 PGO initial baseline |
|
|
||||||
| **FAST v3 + PGO (Phase 68)** | **61.614** | **61.924** | **50.93%** | **GO: +1.19% vs Phase 66** ✓ (seed/WS diversification) |
|
|
||||||
| **FAST v3 + PGO (Phase 69)** | **62.63** | **63.38** | **51.77%** | **強GO: +3.26% vs Phase 68** ✓✓✓ (Warm Pool Size=16, ENV-only) → **昇格済み 新 FAST baseline** ✓ |
|
|
||||||
| FAST v3 + PGO + Phase 75 (C5+C6 ON) [Point D] | **55.51** | - | **45.70%** | Phase 75-4 FAST PGO rebase (C5+C6 inline slots): +3.16% vs Point A ✓ **[REBASE URGENT]** |
|
|
||||||
| Standard | 53.50 | - | 44.21% | 安全・互換基準(Phase 48 前計測、要 rebase) |
|
|
||||||
| OBSERVE | TBD | - | - | 診断カウンタ ON |
|
|
||||||
|
|
||||||
補足:
|
補足:
|
||||||
|
- Phase 66/68/69(60M〜62M台)は **過去コミットでの到達点(historical)**。現 HEAD の SSOT baseline と直接比較しない(比較する場合は rebase を取る)。
|
||||||
- Phase 63: `make bench_random_mixed_hakmem_fast_fixed`(`HAKMEM_FAST_PROFILE_FIXED=1`)は research build(GO 未達時は SSOT に載せない)。結果は `docs/analysis/PHASE63_FAST_PROFILE_FIXED_BUILD_RESULTS.md`。
|
- Phase 63: `make bench_random_mixed_hakmem_fast_fixed`(`HAKMEM_FAST_PROFILE_FIXED=1`)は research build(GO 未達時は SSOT に載せない)。結果は `docs/analysis/PHASE63_FAST_PROFILE_FIXED_BUILD_RESULTS.md`。
|
||||||
|
|
||||||
**FAST vs Standard delta: +10.6%**(Standard 側は Phase 48 前計測、mimalloc baseline 変更で ratio 調整)
|
**FAST vs Standard delta(Phase 89): +5.45%**
|
||||||
|
|
||||||
**Phase 59b Notes:**
|
**Phase 59b Notes:**
|
||||||
- **Profile Change**: Switched from `MIXED_TINYV3_C7_BALANCED` to `MIXED_TINYV3_C7_SAFE` (Speed-first) as canonical default
|
- **Profile Change**: Switched from `MIXED_TINYV3_C7_BALANCED` to `MIXED_TINYV3_C7_SAFE` (Speed-first) as canonical default
|
||||||
@ -92,7 +83,7 @@ scripts/bench_allocators_compare.sh --scenario mixed --iterations 50
|
|||||||
|
|
||||||
結果(2025-12-18, mixed, iterations=50):
|
結果(2025-12-18, mixed, iterations=50):
|
||||||
|
|
||||||
| allocator | ops/sec (M) | vs mimalloc (Phase 69 ref) | vs system | soft_pf | RSS (MB) |
|
| allocator | ops/sec (M) | vs mimalloc (reference) | vs system | soft_pf | RSS (MB) |
|
||||||
|----------|--------------|----------------------------|-----------|---------|----------|
|
|----------|--------------|----------------------------|-----------|---------|----------|
|
||||||
| tcmalloc (LD_PRELOAD) | 34.56 | 28.6% | 11.2x | 3,842 | 21.5 |
|
| tcmalloc (LD_PRELOAD) | 34.56 | 28.6% | 11.2x | 3,842 | 21.5 |
|
||||||
| jemalloc (LD_PRELOAD) | 24.33 | 20.1% | 7.9x | 143 | 3.8 |
|
| jemalloc (LD_PRELOAD) | 24.33 | 20.1% | 7.9x | 143 | 3.8 |
|
||||||
@ -114,16 +105,16 @@ scripts/bench_allocators_compare.sh --scenario mixed --iterations 50
|
|||||||
|
|
||||||
推奨マイルストーン(Mixed 16–1024B, FAST build):
|
推奨マイルストーン(Mixed 16–1024B, FAST build):
|
||||||
|
|
||||||
| Milestone | Target | Current (2025-12-18, corrected) | Status |
|
| Milestone | Target | Current (Phase 89 SSOT) | Status |
|
||||||
|-----------|--------|-----------------------------------|--------|
|
|-----------|--------|-----------------------------------|--------|
|
||||||
| M1 | mimalloc の **50%** | 44.46% | 🟡 **未達** (PROFILE 修正後の計測) |
|
| M1 | mimalloc の **50%** | 43.39% | 🟡 **未達** |
|
||||||
| M2 | mimalloc の **55%** | 44.46% | 🔴 **未達** (Gap: -10.54pp)|
|
| M2 | mimalloc の **55%** | 43.39% | 🔴 **未達** (Gap: -11.61pp)|
|
||||||
| M3 | mimalloc の **60%** | - | 🔴 未達(構造改造必要)|
|
| M3 | mimalloc の **60%** | - | 🔴 未達(構造改造必要)|
|
||||||
| M4 | mimalloc の **65–70%** | - | 🔴 未達(構造改造必要)|
|
| M4 | mimalloc の **65–70%** | - | 🔴 未達(構造改造必要)|
|
||||||
|
|
||||||
**現状:** hakmem (FAST PGO) (2025-12-18) = 55.53M ops/s = mimalloc の 44.46%(Random Mixed, WS=400, ITERS=20M, 10-run)
|
**現状(SSOT):** hakmem (FAST PGO minimal) = **54.16M ops/s** = mimalloc の **43.39%**(Random Mixed, WS=400, ITERS=20M, 10-run)
|
||||||
|
|
||||||
⚠️ **重要**: Phase 69 baseline (62.63M = 51.77%) は古い計測条件の可能性。PROFILE 明示修正後の新 baseline は 44.46%(M1 未達)。
|
⚠️ **重要**: Phase 66/68/69(60M〜62M台)は過去コミットでの到達点(historical)。現 HEAD との比較は `docs/analysis/PHASE67A_LAYOUT_TAX_FORENSICS_SSOT.md` に沿って rebase を取ってから行う。
|
||||||
|
|
||||||
**Phase 68 PGO 昇格(Phase 66 → Phase 68 upgrade):**
|
**Phase 68 PGO 昇格(Phase 66 → Phase 68 upgrade):**
|
||||||
- Phase 66 baseline: 60.89M ops/s = 50.32% (+3.0% mean, 3-run stable)
|
- Phase 66 baseline: 60.89M ops/s = 50.32% (+3.0% mean, 3-run stable)
|
||||||
|
|||||||
128
docs/analysis/PHASE87_INSTRUMENTATION_COMPLETE.md
Normal file
128
docs/analysis/PHASE87_INSTRUMENTATION_COMPLETE.md
Normal file
@ -0,0 +1,128 @@
|
|||||||
|
# Phase 87: Inline Slots Overflow Observation - Infrastructure Setup (COMPLETE)
|
||||||
|
|
||||||
|
## Phase 87-1: Telemetry Box Created ✓
|
||||||
|
|
||||||
|
### Files Added
|
||||||
|
|
||||||
|
1. **core/box/tiny_inline_slots_overflow_stats_box.h**
|
||||||
|
- Global counter structure: `TinyInlineSlotsOverflowStats`
|
||||||
|
- Counters: C3/C4/C5/C6 push_full, pop_empty, overflow_to_uc, overflow_to_legacy
|
||||||
|
- Fast-path inline API with `__builtin_expect()` for zero-cost when disabled
|
||||||
|
- Enabled via compile-time gate:
|
||||||
|
- `HAKMEM_INLINE_SLOTS_OVERFLOW_STATS_COMPILED=0/1` (default 0)
|
||||||
|
- Non-RELEASE builds can also enable it (depending on build flags)
|
||||||
|
|
||||||
|
2. **core/box/tiny_inline_slots_overflow_stats_box.c**
|
||||||
|
- Global state initialization
|
||||||
|
- Refresh function placeholder
|
||||||
|
- Report function for final statistics output
|
||||||
|
|
||||||
|
### Makefile Integration
|
||||||
|
|
||||||
|
- Added `core/box/tiny_inline_slots_overflow_stats_box.o` to:
|
||||||
|
- OBJS_BASE
|
||||||
|
- BENCH_HAKMEM_OBJS_BASE
|
||||||
|
- TINY_BENCH_OBJS_BASE
|
||||||
|
- OBSERVE build enables telemetry explicitly:
|
||||||
|
- `make bench_random_mixed_hakmem_observe` adds `-DHAKMEM_INLINE_SLOTS_OVERFLOW_STATS_COMPILED=1`
|
||||||
|
|
||||||
|
### Build Status
|
||||||
|
|
||||||
|
✓ Successfully compiled (no errors, no warnings in new code)
|
||||||
|
✓ Binary ready: `bench_random_mixed_hakmem`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Next: Phase 87-2 - Counter Integration Points
|
||||||
|
|
||||||
|
To enable overflow measurement, counters must be injected at:
|
||||||
|
|
||||||
|
### Free Path (Push FULL)
|
||||||
|
- Location: `core/front/tiny_c6_inline_slots.h:37` (c6_inline_push)
|
||||||
|
- Trigger: When ring is FULL, return 0
|
||||||
|
- Counter: `tiny_inline_slots_count_push_full(6)`
|
||||||
|
|
||||||
|
- Similar for C3 (`core/front/tiny_c3_inline_slots.h`), C4, C5
|
||||||
|
|
||||||
|
### Alloc Path (Pop EMPTY)
|
||||||
|
- Location: `core/front/tiny_c6_inline_slots.h:54` (c6_inline_pop)
|
||||||
|
- Trigger: When ring is EMPTY, return NULL
|
||||||
|
- Counter: `tiny_inline_slots_count_pop_empty(6)`
|
||||||
|
|
||||||
|
- Similar for C3, C4, C5
|
||||||
|
|
||||||
|
### Fallback Destinations (Unified Cache)
|
||||||
|
- Location: `core/front/tiny_unified_cache.h:177-216` (unified_cache_push)
|
||||||
|
- Trigger: When unified cache is FULL, return 0
|
||||||
|
- Counter: `tiny_inline_slots_count_overflow_to_uc()`
|
||||||
|
|
||||||
|
- Also: when unified_cache_push returns 0, legacy path gets called
|
||||||
|
- Counter: `tiny_inline_slots_count_overflow_to_legacy()`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Testing Plan (Phase 87-2)
|
||||||
|
|
||||||
|
### Observation Conditions
|
||||||
|
- **Profile**: MIXED_TINYV3_C7_SAFE
|
||||||
|
- **Working Set**: WS=400 (default inline slots conditions)
|
||||||
|
- **Iterations**: 20M (ITERS=20000000)
|
||||||
|
- **Runs**: single-run OBSERVE preflight (SSOT throughput runs remain Standard/FAST)
|
||||||
|
|
||||||
|
### Expected Output
|
||||||
|
Debug build will print statistics:
|
||||||
|
```
|
||||||
|
=== PHASE 87: INLINE SLOTS OVERFLOW STATS ===
|
||||||
|
|
||||||
|
PUSH FULL (Free Path Ring Overflow):
|
||||||
|
C3: ...
|
||||||
|
C4: ...
|
||||||
|
C5: ...
|
||||||
|
C6: ...
|
||||||
|
|
||||||
|
POP EMPTY (Alloc Path Ring Underflow):
|
||||||
|
C3: ...
|
||||||
|
C4: ...
|
||||||
|
C5: ...
|
||||||
|
C6: ...
|
||||||
|
|
||||||
|
Note: `OVERFLOW DESTINATIONS` counters are optional and may remain 0 unless explicitly instrumented at fallback call sites.
|
||||||
|
```
|
||||||
|
|
||||||
|
### GO/NO-GO Decision Logic
|
||||||
|
|
||||||
|
**GO for Phase 88** if:
|
||||||
|
- `(push_full + pop_empty) / (20M * 3 runs) ≥ 0.1%`
|
||||||
|
- Indicates sufficient overflow frequency to warrant batch optimization
|
||||||
|
|
||||||
|
**NO-GO for Phase 88** if:
|
||||||
|
- Overflow rate < 0.1%
|
||||||
|
- Suggests overhead reduction ROI is minimal
|
||||||
|
- Consider alternative optimization layers
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture Notes
|
||||||
|
|
||||||
|
- Counters use `_Atomic` for thread-safety (single increment per operation)
|
||||||
|
- Zero overhead in RELEASE builds (compile-time constant folding)
|
||||||
|
- Reporting happens on exit (calls `tiny_inline_slots_overflow_report_stats()`)
|
||||||
|
- Call point: Should add to bench program exit sequence
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Files Status
|
||||||
|
|
||||||
|
| File | Status |
|
||||||
|
|------|--------|
|
||||||
|
| tiny_inline_slots_overflow_stats_box.h | ✓ Created |
|
||||||
|
| tiny_inline_slots_overflow_stats_box.c | ✓ Created |
|
||||||
|
| Makefile | ✓ Updated (object files added) |
|
||||||
|
| C3/C4/C5/C6 inline slots | ⏳ Pending counter integration |
|
||||||
|
| Observation binary build | ⏳ Pending debug build |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Ready for Phase 87-2
|
||||||
|
|
||||||
|
Next action: Inject counters into inline slots and run RUNS=3 observation.
|
||||||
102
docs/analysis/PHASE87_OBSERVATION_RESULTS.md
Normal file
102
docs/analysis/PHASE87_OBSERVATION_RESULTS.md
Normal file
@ -0,0 +1,102 @@
|
|||||||
|
# Phase 87: Inline Slots Overflow Observation Results
|
||||||
|
|
||||||
|
## Objective
|
||||||
|
Measure inline slots overflow frequency (C3/C4/C5/C6) to determine if Phase 88 (batch drain optimization) is worth implementing.
|
||||||
|
|
||||||
|
## Observation Setup
|
||||||
|
- **Workload**: Mixed SSOT (WS=400, 16-1024B allocation sizes)
|
||||||
|
- **Operations**: 20,000,000 random alloc/free operations
|
||||||
|
- **Runs**: single-run observation (OBSERVE binary)
|
||||||
|
- **Configuration**:
|
||||||
|
- Route assignments: LEGACY for all C0-C7
|
||||||
|
- Inline slots: C4/C5/C6 enabled (Phase 75/76), fixed mode ON (Phase 78), switch dispatch ON (Phase 80)
|
||||||
|
|
||||||
|
## Critical Fix (measurement correctness)
|
||||||
|
|
||||||
|
An earlier observation run reported `PUSH TOTAL/POP TOTAL = 0` for all classes.
|
||||||
|
That was **not** valid evidence that inline slots were unused.
|
||||||
|
Root cause was **telemetry compile gating**:
|
||||||
|
|
||||||
|
- `tiny_inline_slots_overflow_enabled()` is a header-only hot-path check.
|
||||||
|
- The original implementation relied on a `#define` inside `tiny_inline_slots_overflow_stats_box.c`,
|
||||||
|
which does not apply to other translation units.
|
||||||
|
- Fix: introduce `HAKMEM_INLINE_SLOTS_OVERFLOW_STATS_COMPILED` in `core/hakmem_build_flags.h` and make the enabled check depend on it.
|
||||||
|
- OBSERVE build now enables it via Makefile: `bench_random_mixed_hakmem_observe` adds `-DHAKMEM_INLINE_SLOTS_OVERFLOW_STATS_COMPILED=1`.
|
||||||
|
|
||||||
|
## Verified Result: inline slots **are** being called (WS=400 SSOT)
|
||||||
|
|
||||||
|
### Total Operation Counts (Verification)
|
||||||
|
```
|
||||||
|
PUSH TOTAL (Free Path Attempts):
|
||||||
|
C4: 687,564
|
||||||
|
C5: 1,373,605
|
||||||
|
C6: 2,750,862
|
||||||
|
TOTAL (C4-C6): 4,812,031
|
||||||
|
|
||||||
|
POP TOTAL (Alloc Path Attempts):
|
||||||
|
C4: 687,564
|
||||||
|
C5: 1,373,605
|
||||||
|
C6: 2,750,862
|
||||||
|
TOTAL (C4-C6): 4,812,031
|
||||||
|
```
|
||||||
|
|
||||||
|
This confirms:
|
||||||
|
- ✅ `tiny_legacy_fallback_free_base_with_env()` is being executed (LEGACY fallback path).
|
||||||
|
- ✅ C4/C5/C6 inline slots push/pop are active in the LEGACY fallback/hot alloc paths.
|
||||||
|
|
||||||
|
## Overflow / Underflow Rates (WS=400 SSOT)
|
||||||
|
|
||||||
|
```
|
||||||
|
PUSH FULL (Free Path Ring Overflow):
|
||||||
|
TOTAL: 0 (0.00%)
|
||||||
|
|
||||||
|
POP EMPTY (Alloc Path Ring Underflow):
|
||||||
|
TOTAL: 168 (0.003%)
|
||||||
|
```
|
||||||
|
|
||||||
|
Interpretation:
|
||||||
|
- WS=400 SSOT is a **near-perfect steady state** for C4/C5/C6 inline slots.
|
||||||
|
- Overflow batching ROI is effectively zero: `push_full=0`, `pop_empty≈0.003%`.
|
||||||
|
|
||||||
|
## Phase 88 ROI Decision: **NO-GO**
|
||||||
|
|
||||||
|
### Recommendation
|
||||||
|
**DO NOT IMPLEMENT Phase 88 (Batch Drain Optimization)**
|
||||||
|
|
||||||
|
### Rationale
|
||||||
|
1. **Overflow is essentially absent**: `push_full=0`, `pop_empty≈0.003%`.
|
||||||
|
2. **Batch drain overhead would dominate**: any additional logic is far more likely to incur layout/branch tax than to save work.
|
||||||
|
3. **This is already the desirable state**: inline slots are sized correctly for WS=400 SSOT.
|
||||||
|
|
||||||
|
### Cost-Benefit Analysis
|
||||||
|
- **Implementation Cost**: high (batch logic, tests, ongoing maintenance)
|
||||||
|
- **Benefit Under SSOT**: ~0% (overflow frequency too low)
|
||||||
|
- **Risk**: layout tax / regression in a hot-path-heavy code region
|
||||||
|
|
||||||
|
### Alternative Path (If overflow work is desired)
|
||||||
|
Use a research workload that intentionally produces misses/overflow (e.g. larger WS), and re-run this observation.
|
||||||
|
Do not use WS=400 SSOT for that validation.
|
||||||
|
|
||||||
|
## Implementation Artifacts
|
||||||
|
|
||||||
|
### Files Created
|
||||||
|
- `core/box/tiny_inline_slots_overflow_stats_box.h` - Telemetry box header
|
||||||
|
- `core/box/tiny_inline_slots_overflow_stats_box.c` - Telemetry implementation
|
||||||
|
- `core/front/tiny_c{3,4,5,6}_inline_slots.h` - Updated with total counter calls
|
||||||
|
|
||||||
|
### Telemetry Infrastructure
|
||||||
|
- Atomic counters for thread-safe measurement
|
||||||
|
- Compile-time enabled (always in observation builds)
|
||||||
|
- Zero overhead when disabled (checked at init time)
|
||||||
|
- Percentage calculations for overflow rates
|
||||||
|
|
||||||
|
## Conclusion
|
||||||
|
|
||||||
|
**Phase 87 observation (with fixed telemetry gating) confirms that inline slots are active and overflow is negligible for WS=400 SSOT.**
|
||||||
|
Phase 88 is therefore correctly frozen as NO-GO for SSOT performance work.
|
||||||
|
|
||||||
|
### Score: NO-GO ✗
|
||||||
|
- Expected Improvement: ~0% (overflow extremely rare)
|
||||||
|
- Actual Improvement: N/A (measurement-only)
|
||||||
|
- Implementation Burden: High (new code path, batch logic)
|
||||||
|
- Recommendation: Archive Phase 88 pending inline slots adoption
|
||||||
186
docs/analysis/PHASE89_BOTTLENECK_ANALYSIS.md
Normal file
186
docs/analysis/PHASE89_BOTTLENECK_ANALYSIS.md
Normal file
@ -0,0 +1,186 @@
|
|||||||
|
# Phase 89: Bottleneck Analysis & Next Optimization Candidates
|
||||||
|
|
||||||
|
**Date**: 2025-12-18
|
||||||
|
**SSOT Baseline (Standard)**: 51.36M ops/s
|
||||||
|
**SSOT Optimized (FAST PGO)**: 54.16M ops/s (+5.45%)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Perf Profile Summary
|
||||||
|
|
||||||
|
**Profile Run**: 40M operations (0.78s), 833 samples
|
||||||
|
**Top 50 Functions by CPU Time**:
|
||||||
|
|
||||||
|
| Rank | Function | CPU Time | Type | Notes |
|
||||||
|
|------|----------|----------|------|-------|
|
||||||
|
| 1 | **free** | 27.40% | **HOTTEST** | Free path (malloc_tiny_fast main handler) |
|
||||||
|
| 2 | main | 26.30% | Loop | Benchmark loop structure (not optimizable) |
|
||||||
|
| 3 | **malloc** | 20.36% | **HOTTEST** | Alloc path (malloc_tiny_fast main handler) |
|
||||||
|
| 4 | malloc.cold | 10.65% | Cold path | Rarely executed alloc fallback |
|
||||||
|
| 5 | free.cold | 5.59% | Cold path | Rarely executed free fallback |
|
||||||
|
| 6 | **tiny_region_id_write_header** | 2.98% | **HOT** | Region metadata write (inlined candidate) |
|
||||||
|
| 7-50 | Various | ~5% | Minor | Page faults, memset, init (one-time/rare) |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Observations
|
||||||
|
|
||||||
|
### CPU Time Breakdown:
|
||||||
|
- **malloc + free combined**: 47.76% (27.40% + 20.36%)
|
||||||
|
- This is the core allocation/deallocation hot path
|
||||||
|
- Current architecture: `malloc_tiny_fast.h` with inline slots (C4-C7) already optimized
|
||||||
|
|
||||||
|
- **tiny_region_id_write_header**: 2.98%
|
||||||
|
- Called during every free for C4-C7 classes
|
||||||
|
- Currently NOT inlined to all call sites (selective inlining only)
|
||||||
|
- Potential optimization: Force always_inline for hot paths
|
||||||
|
|
||||||
|
- **malloc.cold / free.cold**: 10.65% + 5.59% = 16.24%
|
||||||
|
- Cold paths (fallback routes)
|
||||||
|
- Should NOT be optimized (violates layout tax principle)
|
||||||
|
- Adding code to optimize cold paths increases code bloat
|
||||||
|
|
||||||
|
### Inline Slots Status (from OBSERVE):
|
||||||
|
- C4/C5/C6 inline slots ARE active during measurement
|
||||||
|
- PUSH TOTAL: 4.81M ops (100% of C4-C7 operations)
|
||||||
|
- Overflow rate: 0.003% (negligible)
|
||||||
|
- **Conclusion**: Inline slots are working perfectly, not a bottleneck
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Top 3 Optimization Candidates
|
||||||
|
|
||||||
|
### Candidate 1: tiny_region_id_write_header Inlining (2.98% CPU)
|
||||||
|
|
||||||
|
**Current Implementation**:
|
||||||
|
- Located in: `core/region_id_v6.c`
|
||||||
|
- Called from: `malloc_tiny_fast.h` during free path
|
||||||
|
- Current inlining: Selective (only some call sites)
|
||||||
|
|
||||||
|
**Opportunity**:
|
||||||
|
- Force `always_inline` on hot-path call sites to eliminate function call overhead
|
||||||
|
- Estimated savings: 1-2% CPU time (small gain, low risk)
|
||||||
|
- **Layout Impact**: MINIMAL (only modifying call site, not adding code bulk)
|
||||||
|
|
||||||
|
**Risk Assessment**:
|
||||||
|
- LOW: Function is already optimized, only changing inline strategy
|
||||||
|
- No new branches or code paths
|
||||||
|
- I-cache pressure: minimal (function body is ~30-50 cycles)
|
||||||
|
|
||||||
|
**Recommendation**: **YES - PURSUE**
|
||||||
|
- Implement: Add `__attribute__((always_inline))` to hot-path wrapper
|
||||||
|
- Target: Free path only (malloc path is lower frequency)
|
||||||
|
- Expected gain: +1-2% throughput
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Candidate 2: malloc/free Hot-Path Branch Reduction (47.76% CPU)
|
||||||
|
|
||||||
|
**Current Implementation**:
|
||||||
|
- Located in: `core/front/malloc_tiny_fast.h` (Phase 9/10/80-1 optimized)
|
||||||
|
- Already using: Fixed inline slots, switch dispatch, per-op policy snapshots
|
||||||
|
- Branches: 1-3 per operation (policy check, class route, handler dispatch)
|
||||||
|
|
||||||
|
**Opportunity**:
|
||||||
|
- Profile shows **56.4M branch-misses** out of ~1.75 insn/cycle
|
||||||
|
- This indicates branch prediction pressure, not a simple optimization
|
||||||
|
- Further reduction requires: Per-thread pre-computed routing tables or elimination of policy snapshot checks
|
||||||
|
|
||||||
|
**Analysis**:
|
||||||
|
- Phase 9/10/78-1/80-1/83-1 have already eliminated most low-hanging branches
|
||||||
|
- Remaining optimization would require structural change (pre-compute all routing at init time)
|
||||||
|
- **Risk**: Code bloat from pre-computed tables, potential layout tax regression
|
||||||
|
|
||||||
|
**Recommendation**: **DEFERRED TO PHASE 90+**
|
||||||
|
- Requires architectural change (similar to Phase 85's approach, which was NO-GO)
|
||||||
|
- Wait for overflow/workload characteristics that justify the complexity
|
||||||
|
- Current gains are saturated
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Candidate 3: Cold-Path De-duplication (malloc.cold/free.cold = 16.24% CPU)
|
||||||
|
|
||||||
|
**Current Implementation**:
|
||||||
|
- malloc.cold: 10.65% (fallback alloc path)
|
||||||
|
- free.cold: 5.59% (fallback free path)
|
||||||
|
|
||||||
|
**Opportunity**: NONE (Intentional Design)
|
||||||
|
|
||||||
|
**Rationale**:
|
||||||
|
- Cold paths are EXPLICITLY separate to avoid code bloat in hot path
|
||||||
|
- Separating code improves I-cache utilization for hot path
|
||||||
|
- Optimizing cold path would ADD code to hot path (violating layout tax principle)
|
||||||
|
- Cold paths are rarely executed in SSOT workload
|
||||||
|
|
||||||
|
**Recommendation**: **NO - DO NOT PURSUE**
|
||||||
|
- Aligns with user's emphasis on "avoiding layout tax"
|
||||||
|
- Cold paths are correctly placed
|
||||||
|
- Optimization here would hurt hot-path performance
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Performance Ceiling Analysis
|
||||||
|
|
||||||
|
**FAST PGO vs Standard: 5.45% delta**
|
||||||
|
|
||||||
|
This gap represents:
|
||||||
|
1. **PGO branch prediction optimizations** (~3%)
|
||||||
|
- PGO reorders frequently-taken paths
|
||||||
|
- Improves branch prediction hit rate
|
||||||
|
|
||||||
|
2. **Code layout optimizations** (~2%)
|
||||||
|
- Hottest functions placed contiguously
|
||||||
|
- Reduces I-cache misses
|
||||||
|
|
||||||
|
3. **Inlining decisions** (~0.5%)
|
||||||
|
- PGO optimizes inlining thresholds
|
||||||
|
- Fewer expensive calls in hot path
|
||||||
|
|
||||||
|
**Implication for Standard Build**:
|
||||||
|
- Standard build is fundamentally limited by branch prediction pressure
|
||||||
|
- Further gains require: (a) reducing branches, or (b) making branches more predictable
|
||||||
|
- Both options require careful architectural tradeoffs
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Recommended Strategy for Phase 90+
|
||||||
|
|
||||||
|
### Immediate (Quick Win):
|
||||||
|
1. **Phase 90: tiny_region_id_write_header always_inline**
|
||||||
|
- Effort: 1-2 lines of code
|
||||||
|
- Expected gain: +1-2%
|
||||||
|
- Risk: LOW
|
||||||
|
|
||||||
|
### Medium-term (Structural):
|
||||||
|
2. **Phase 91: Hot-path routing pre-computation (optional)**
|
||||||
|
- Only if overflow rate increases or workload changes
|
||||||
|
- Risk: MEDIUM (code bloat, layout tax)
|
||||||
|
- Expected gain: +2-3% (speculative)
|
||||||
|
|
||||||
|
3. **Phase 92: Allocator comparison sweep**
|
||||||
|
- Use FAST PGO as comparison baseline (+5.45%)
|
||||||
|
- Verify gap closure as individual optimizations accumulate
|
||||||
|
|
||||||
|
### Deferred:
|
||||||
|
- Avoid cold-path optimization (maintains I-cache discipline)
|
||||||
|
- Do NOT pursue redundant branch elimination (saturation point reached)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary Table
|
||||||
|
|
||||||
|
| Candidate | Priority | Effort | Risk | Expected Gain | Recommendation |
|
||||||
|
|-----------|----------|--------|------|----------------|-----------------|
|
||||||
|
| tiny_region_id_write_header inlining | HIGH | 1-2h | LOW | +1-2% | **PURSUE** |
|
||||||
|
| malloc/free branch reduction | MED | 20-40h | MEDIUM | +2-3% | DEFER |
|
||||||
|
| cold-path optimization | LOW | 10-20h | HIGH | +1% | **AVOID** |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Layout Tax Adherence Check
|
||||||
|
|
||||||
|
✓ Candidate 1 (header inlining): No code bulk, maintains I-cache discipline
|
||||||
|
✓ Candidate 2 deferred: Avoids adding branches to hot path
|
||||||
|
✓ Candidate 3 avoided: Maintains cold-path separation principle
|
||||||
|
|
||||||
|
**Conclusion**: All recommendations align with user's "避けるlayout tax" principle.
|
||||||
141
docs/analysis/PHASE89_SSOT_MEASUREMENT.md
Normal file
141
docs/analysis/PHASE89_SSOT_MEASUREMENT.md
Normal file
@ -0,0 +1,141 @@
|
|||||||
|
# Phase 89 SSOT Measurement Capture
|
||||||
|
|
||||||
|
**Timestamp**: 2025-12-18 23:06:01
|
||||||
|
**Git SHA**: e4c5f0535
|
||||||
|
**Branch**: master
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 1: OBSERVE Binary (Telemetry Verification)
|
||||||
|
|
||||||
|
**Binary**: `./bench_random_mixed_hakmem_observe`
|
||||||
|
**Profile**: `MIXED_TINYV3_C7_SAFE`
|
||||||
|
**Iterations**: 20,000,000
|
||||||
|
**Working Set**: 400
|
||||||
|
|
||||||
|
**Inline Slots Overflow Stats (Preflight Verification)**:
|
||||||
|
- PUSH TOTAL: 4,812,031 ops (C4+C5+C6 verified active)
|
||||||
|
- POP TOTAL: 4,812,031 ops
|
||||||
|
- PUSH FULL: 0 (0.00%)
|
||||||
|
- POP EMPTY: 168 (0.003%)
|
||||||
|
- LEGACY FALLBACK CALLS: 5,327,294
|
||||||
|
- Judgment: ✓ \[C\] LEGACY used AND C4/C5/C6 INLINE SLOTS ACTIVE
|
||||||
|
- Throughput (with telemetry): **51.52M ops/s**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 2: Standard Build (Clean Performance Baseline)
|
||||||
|
|
||||||
|
**Binary**: `./bench_random_mixed_hakmem`
|
||||||
|
**Build Flags**: RELEASE, no telemetry, standard optimization
|
||||||
|
**Profile**: `MIXED_TINYV3_C7_SAFE`
|
||||||
|
**Iterations**: 20,000,000
|
||||||
|
**Working Set**: 400
|
||||||
|
**Runs**: 10
|
||||||
|
|
||||||
|
**10-Run Results**:
|
||||||
|
| Run | Throughput | Status |
|
||||||
|
|-----|-----------|--------|
|
||||||
|
| 1 | 51.15M | OK |
|
||||||
|
| 2 | 51.44M | OK |
|
||||||
|
| 3 | 51.61M | OK |
|
||||||
|
| 4 | 51.73M | Peak |
|
||||||
|
| 5 | 50.74M | Low |
|
||||||
|
| 6 | 51.34M | OK |
|
||||||
|
| 7 | 50.74M | Low |
|
||||||
|
| 8 | 51.37M | OK |
|
||||||
|
| 9 | 51.39M | OK |
|
||||||
|
| 10 | 51.31M | OK |
|
||||||
|
|
||||||
|
**Statistics**:
|
||||||
|
- **Mean**: 51.36M ops/s
|
||||||
|
- **Min**: 50.74M ops/s
|
||||||
|
- **Max**: 51.73M ops/s
|
||||||
|
- **Range**: 0.99M ops/s
|
||||||
|
- **CV**: ~0.7%
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 3: FAST PGO Build (Optimized Performance Tracking)
|
||||||
|
|
||||||
|
**Binary**: `./bench_random_mixed_hakmem_minimal_pgo`
|
||||||
|
**Build Flags**: RELEASE, PGO optimized, BENCH_MINIMAL=1
|
||||||
|
**Profile**: `MIXED_TINYV3_C7_SAFE`
|
||||||
|
**Iterations**: 20,000,000
|
||||||
|
**Working Set**: 400
|
||||||
|
**Runs**: 10
|
||||||
|
|
||||||
|
**10-Run Results**:
|
||||||
|
| Run | Throughput | Status |
|
||||||
|
|-----|-----------|--------|
|
||||||
|
| 1 | 55.13M | Peak |
|
||||||
|
| 2 | 54.73M | High |
|
||||||
|
| 3 | 53.81M | OK |
|
||||||
|
| 4 | 54.60M | High |
|
||||||
|
| 5 | 55.02M | Peak |
|
||||||
|
| 6 | 52.89M | Low |
|
||||||
|
| 7 | 53.61M | OK |
|
||||||
|
| 8 | 53.53M | OK |
|
||||||
|
| 9 | 55.08M | Peak |
|
||||||
|
| 10 | 53.51M | OK |
|
||||||
|
|
||||||
|
**Statistics**:
|
||||||
|
- **Mean**: 54.16M ops/s
|
||||||
|
- **Min**: 52.89M ops/s
|
||||||
|
- **Max**: 55.13M ops/s
|
||||||
|
- **Range**: 2.24M ops/s
|
||||||
|
- **CV**: ~1.5%
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Performance Delta Analysis
|
||||||
|
|
||||||
|
**Standard vs FAST PGO**:
|
||||||
|
- Delta: 54.16M - 51.36M = **2.80M ops/s**
|
||||||
|
- Percentage Gain: (2.80M / 51.36M) × 100 = **5.45%**
|
||||||
|
|
||||||
|
**Interpretation**:
|
||||||
|
- FAST PGO is 5.45% faster than Standard build
|
||||||
|
- This represents the optimization ceiling with current profile-guided configuration
|
||||||
|
- SSOT baseline for bottleneck analysis: **Standard 51.36M ops/s**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Environment Configuration (SSOT Locked)
|
||||||
|
|
||||||
|
**Key ENV variables** (forced in `scripts/run_mixed_10_cleanenv.sh`):
|
||||||
|
- `HAKMEM_BENCH_MIN_SIZE=16` - SSOT: prevent size drift
|
||||||
|
- `HAKMEM_BENCH_MAX_SIZE=1040` - SSOT: prevent class filtering
|
||||||
|
- `HAKMEM_BENCH_C5_ONLY=0` - SSOT: no single-class mode
|
||||||
|
- `HAKMEM_BENCH_C6_ONLY=0` - SSOT: no single-class mode
|
||||||
|
- `HAKMEM_BENCH_C7_ONLY=0` - SSOT: no single-class mode
|
||||||
|
- `HAKMEM_WARM_POOL_SIZE=16` - Phase 69 winner
|
||||||
|
- `HAKMEM_TINY_C4_INLINE_SLOTS=1` - Phase 76-1 promoted
|
||||||
|
- `HAKMEM_TINY_C5_INLINE_SLOTS=1` - Phase 75-2 promoted
|
||||||
|
- `HAKMEM_TINY_C6_INLINE_SLOTS=1` - Phase 75-1 promoted
|
||||||
|
- `HAKMEM_TINY_INLINE_SLOTS_FIXED=1` - Phase 78-1 promoted
|
||||||
|
- `HAKMEM_TINY_INLINE_SLOTS_SWITCHDISPATCH=1` - Phase 80-1 promoted
|
||||||
|
- `HAKMEM_TINY_INLINE_SLOTS_SWITCHDISPATCH_FIXED=0` - Phase 83-1 NO-GO
|
||||||
|
- `HAKMEM_FASTLANE_DIRECT=1` - Phase 19-1b promoted
|
||||||
|
- `HAKMEM_FREE_TINY_FAST_MONO_DUALHOT=1` - Phase 9/10 promoted
|
||||||
|
- `HAKMEM_FREE_TINY_FAST_MONO_LEGACY_DIRECT=1` - Phase 10 promoted
|
||||||
|
- `HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE` - default route
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## System Configuration
|
||||||
|
|
||||||
|
- **CPU**: AMD Ryzen 7 5825U with Radeon Graphics
|
||||||
|
- **Cores**: 16
|
||||||
|
- **Memory**: MemTotal: 13166508 kB
|
||||||
|
- **Kernel**: 6.8.0-87-generic
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Next Steps (Phase 89 Step 5)
|
||||||
|
|
||||||
|
**Objective**: Identify top 3 bottleneck candidates using perf measurement
|
||||||
|
- Run `perf top` during Mixed SSOT execution
|
||||||
|
- Analyze top 50 functions by CPU time
|
||||||
|
- Filter to high-frequency code paths (avoid 0.001% optimizations)
|
||||||
|
- Prepare recommendations for Phase 90+
|
||||||
145
docs/analysis/PHASE90_STRUCTURAL_REVIEW_AND_GAP_TRIAGE_SSOT.md
Normal file
145
docs/analysis/PHASE90_STRUCTURAL_REVIEW_AND_GAP_TRIAGE_SSOT.md
Normal file
@ -0,0 +1,145 @@
|
|||||||
|
# Phase 90: Structural Review & Gap Triage(mimalloc/tcmalloc 差分を“設計”に落とす SSOT)
|
||||||
|
|
||||||
|
目的: 「layout tax を疑う/疑わない」以前に、**差分がどこから来ているか**を “同じ儀式” で毎回再現し、次の構造案(Phase 91+)を決める。
|
||||||
|
|
||||||
|
前提:
|
||||||
|
- SSOT runner(性能の正): `scripts/run_mixed_10_cleanenv.sh`(`ITERS=20000000 WS=400 RUNS=10`)
|
||||||
|
- OBSERVE runner(経路の正): `scripts/run_mixed_observe_ssot.sh`(telemetry込み、性能比較に使わない)
|
||||||
|
- 現行SSOT(Phase 89): `docs/analysis/PHASE89_SSOT_MEASUREMENT.md`
|
||||||
|
|
||||||
|
非目標:
|
||||||
|
- 長時間 soak(5分/30分/60分)は Phase 90 ではやらない。
|
||||||
|
- “1行の micro-opt” は Phase 90 ではやらない(Phase 91+ の入力だけ作る)。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Box Theory ルール(Phase 90 版)
|
||||||
|
|
||||||
|
1. **境界は1箇所**: 測定の入口はスクリプトで固定(手打ち禁止)。
|
||||||
|
2. **戻せる**: 比較は同一バイナリ ENV トグル、または “同一バイナリ LD_PRELOAD” を優先。
|
||||||
|
3. **見える化**: まず OBSERVE で「踏んでる」を確定し、SSOT で数値を取る。
|
||||||
|
4. **Fail-fast**: `HAKMEM_PROFILE` 未指定など SSOT 違反は即エラー(スクリプト側で強制)。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 0: SSOT Preflight(経路確認、性能ではない)
|
||||||
|
|
||||||
|
目的: “踏んでない最適化” を排除する。
|
||||||
|
|
||||||
|
```bash
|
||||||
|
make bench_random_mixed_hakmem_observe
|
||||||
|
HAKMEM_ROUTE_BANNER=1 ./scripts/run_mixed_observe_ssot.sh | tee /tmp/phase90_observe_preflight.log
|
||||||
|
```
|
||||||
|
|
||||||
|
判定:
|
||||||
|
- `Route assignments` が想定と一致していること(Mixed SSOT の既定は多くが `LEGACY` になりがち)
|
||||||
|
- `Inline Slots Overflow Stats` が **PUSH/POP TOTAL > 0** であること(C4/C5/C6 inline slots が生きている)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 1: hakmem SSOT baseline(Standard / FAST PGO)
|
||||||
|
|
||||||
|
目的: Phase 89 と同じ条件で “今の値” を固定する(CV 付き)。
|
||||||
|
|
||||||
|
```bash
|
||||||
|
make bench_random_mixed_hakmem
|
||||||
|
./scripts/run_mixed_10_cleanenv.sh | tee /tmp/phase90_hakmem_standard_10run.log
|
||||||
|
|
||||||
|
make pgo-fast-full
|
||||||
|
BENCH_BIN=./bench_random_mixed_hakmem_minimal_pgo ./scripts/run_mixed_10_cleanenv.sh | tee /tmp/phase90_hakmem_fastpgo_10run.log
|
||||||
|
```
|
||||||
|
|
||||||
|
記録(SSOTに必須):
|
||||||
|
- `git rev-parse HEAD`
|
||||||
|
- `Mean/Median/CV`
|
||||||
|
- `HAKMEM_PROFILE`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 2: allocator reference(短時間、長時間なし)
|
||||||
|
|
||||||
|
目的: “外部強者の位置” を数値で固定する(ただし reference)。
|
||||||
|
|
||||||
|
```bash
|
||||||
|
make bench_random_mixed_system bench_random_mixed_mi
|
||||||
|
RUNS=10 scripts/run_allocator_quick_matrix.sh | tee /tmp/phase90_allocator_quick_matrix.log
|
||||||
|
```
|
||||||
|
|
||||||
|
注意:
|
||||||
|
- これは **reference**(別バイナリ/LD_PRELOAD が混ざる)。
|
||||||
|
- SSOT(最適化判断)は必ず Step 1 の同一儀式で行う。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 3: same-binary matrix(layout差を最小化、設計差を浮かせる)
|
||||||
|
|
||||||
|
目的: 「hakmemが遅い」の原因が “layout/ベンチ差” か “アルゴリズム/固定費” かを切り分ける。
|
||||||
|
|
||||||
|
```bash
|
||||||
|
make bench_random_mixed_system shared
|
||||||
|
RUNS=10 scripts/run_allocator_preload_matrix.sh | tee /tmp/phase90_allocator_preload_matrix.log
|
||||||
|
```
|
||||||
|
|
||||||
|
読み方:
|
||||||
|
- `bench_random_mixed_hakmem*`(linked SSOT)と **同じ数値になる必要はない**(経路が違う)。
|
||||||
|
- ここで見るのは「同一入口(malloc/free)での相対差」。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 4: perf stat(同一カウンタで “差分の形” を固定)
|
||||||
|
|
||||||
|
目的: “速い/遅い” を命令/分岐/メモリのどれで負けているかに落とす。
|
||||||
|
|
||||||
|
### hakmem(linked)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
perf stat -e cycles,instructions,branches,branch-misses,cache-misses,iTLB-load-misses,dTLB-load-misses \\
|
||||||
|
./bench_random_mixed_hakmem 20000000 400 1 2>&1 | tee /tmp/phase90_perfstat_hakmem_linked.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
### system binary + LD_PRELOAD(tcmalloc/jemalloc/mimalloc)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
perf stat -e cycles,instructions,branches,branch-misses,cache-misses,iTLB-load-misses,dTLB-load-misses \\
|
||||||
|
env LD_PRELOAD=\"$TCMALLOC_SO\" ./bench_random_mixed_system 20000000 400 1 2>&1 | tee /tmp/phase90_perfstat_tcmalloc_preload.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 90 の “設計判断” 出力(Phase 91 の入力)
|
||||||
|
|
||||||
|
Phase 90 はここで終わり。次のどれを採用するかは **Step 1〜4 の差分**で決める。
|
||||||
|
|
||||||
|
### A) 固定費(命令/分岐)が負けている(最頻パターン)
|
||||||
|
|
||||||
|
狙い:
|
||||||
|
- per-op の “儀式”(route/policy/env/gate)を hot path から追放
|
||||||
|
- できる限り **commit-once / fixed mode** へ寄せる(ただし layout tax を避ける形で)
|
||||||
|
|
||||||
|
次フェーズ候補:
|
||||||
|
- Phase 91: “Hot path contract” の再定義(どの箱を踏まないか、を SSOT 化)
|
||||||
|
|
||||||
|
### B) メモリ系(cache/TLB)が負けている
|
||||||
|
|
||||||
|
狙い:
|
||||||
|
- TLS 構造のサイズ/配置、ptr→meta 到達、書き込み順序(dependency chain)を見直す
|
||||||
|
|
||||||
|
次フェーズ候補:
|
||||||
|
- Phase 91: TLS struct packing / hot fields co-location(小さく、戻せる)
|
||||||
|
|
||||||
|
### C) 同一バイナリ(LD_PRELOAD)では差が小さい
|
||||||
|
|
||||||
|
狙い:
|
||||||
|
- linked SSOT 側の “入口/配置/箱列” が重い(もしくはベンチ差分)
|
||||||
|
|
||||||
|
次フェーズ候補:
|
||||||
|
- Phase 91: linked SSOT の入口を drop-in と揃える(比較の意味を合わせる)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## GO/NO-GO(Phase 90)
|
||||||
|
|
||||||
|
Phase 90 は “計測と設計判断の SSOT 化” が成果物。
|
||||||
|
- **GO**: Step 0〜4 が再現可能(ログが揃い、差分の形が説明できる)
|
||||||
|
- **NO-GO**: `HAKMEM_PROFILE` 未指定/ENV漏れ等で結果が破綻(先に SSOT 儀式を修正)
|
||||||
|
|
||||||
157
docs/analysis/PHASE92_TCMALLOC_GAP_TRIAGE_SSOT.md
Normal file
157
docs/analysis/PHASE92_TCMALLOC_GAP_TRIAGE_SSOT.md
Normal file
@ -0,0 +1,157 @@
|
|||||||
|
# Phase 92: tcmalloc Gap Triage SSOT
|
||||||
|
|
||||||
|
## 目的
|
||||||
|
|
||||||
|
Phase 89 で検出した tcmalloc との性能ギャップ(hakmem: 52M vs tcmalloc: 58M)を**短時間で**原因分類する。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 既知事実(Phase 89 から継承)
|
||||||
|
|
||||||
|
- **hakmem baseline**: 51.36M ops/s (SSOT standard)
|
||||||
|
- **tcmalloc**: 58M ops/s 付近(参考値)
|
||||||
|
- **差分**: -12.8%( hakmem が遅い)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 92 Triage フロー(最短 1-2h)
|
||||||
|
|
||||||
|
### 1️⃣ **ケース A:小オブジェクト(C4-C6) vs 大オブジェクト(C7+)**
|
||||||
|
|
||||||
|
**疑問**: tcmalloc の優位は「小サイズに特化」か「大サイズに強い」か?
|
||||||
|
|
||||||
|
**実施**:
|
||||||
|
```bash
|
||||||
|
# C6 のみ(Small, 16-256B)
|
||||||
|
HAKMEM_BENCH_C6_ONLY=1 RUNS=3 ./scripts/run_mixed_10_cleanenv.sh
|
||||||
|
|
||||||
|
# C7 のみ(Large, 1024B+)
|
||||||
|
HAKMEM_BENCH_C7_ONLY=1 RUNS=3 ./scripts/run_mixed_10_cleanenv.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
**判定**:
|
||||||
|
- C6 > 52M, C7 < 45M → **問題は Large alloc(C7)**
|
||||||
|
- C6 < 50M, C7 < 45M → **問題は均等分散**
|
||||||
|
- C6 > 52M, C7 > 48M → **問題は別(メモリ効率?)**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 2️⃣ **ケース B:Unified Cache vs Inline Slots**
|
||||||
|
|
||||||
|
**疑問**: tcmalloc 優位は「キャッシュ管理」か「インライン最適化」か?
|
||||||
|
|
||||||
|
**実施**:
|
||||||
|
```bash
|
||||||
|
# Inline Slots 全無効
|
||||||
|
HAKMEM_TINY_C6_INLINE_SLOTS=0 HAKMEM_TINY_C5_INLINE_SLOTS=0 \
|
||||||
|
HAKMEM_TINY_C4_INLINE_SLOTS=0 RUNS=3 ./scripts/run_mixed_10_cleanenv.sh
|
||||||
|
|
||||||
|
# Unified Cache のみ(inline slots 全 OFF)
|
||||||
|
HAKMEM_UNIFIED_CACHE_ONLY=1 RUNS=3 ./scripts/run_mixed_10_cleanenv.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
**判定**:
|
||||||
|
- `-inline > 50M` → **inline slots オーバーヘッド**
|
||||||
|
- `-inline < 48M` → **unified cache 自体が遅い**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 3️⃣ **ケース C:フラグメンテーション/再利用効率**
|
||||||
|
|
||||||
|
**疑問**: LIFO vs FIFO の差、または tcmalloc の再利用戦略の優位性?
|
||||||
|
|
||||||
|
**実施**:
|
||||||
|
```bash
|
||||||
|
# LIFO 有効(phase 15)
|
||||||
|
HAKMEM_TINY_UNIFIED_LIFO=1 RUNS=3 ./scripts/run_mixed_10_cleanenv.sh
|
||||||
|
|
||||||
|
# FIFO(default)
|
||||||
|
RUNS=3 ./scripts/run_mixed_10_cleanenv.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
**判定**:
|
||||||
|
- LIFO > +1% → **FIFO が問題候補**
|
||||||
|
- LIFO = FIFO ± 0.5% → **LIFO/FIFO は neutral**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 4️⃣ **ケース D:ページサイズ/プールサイズ**
|
||||||
|
|
||||||
|
**疑問**: tcmalloc と hakmem のメモリレイアウト / warm pool size の違い?
|
||||||
|
|
||||||
|
**実施**:
|
||||||
|
```bash
|
||||||
|
# 大プール(確保多く、断片化少なく)
|
||||||
|
HAKMEM_WARM_POOL_SIZE=100000 RUNS=3 ./scripts/run_mixed_10_cleanenv.sh
|
||||||
|
|
||||||
|
# 小プール(確保少なく、効率見直し)
|
||||||
|
HAKMEM_WARM_POOL_SIZE=1000 RUNS=3 ./scripts/run_mixed_10_cleanenv.sh
|
||||||
|
|
||||||
|
# デフォルト
|
||||||
|
RUNS=3 ./scripts/run_mixed_10_cleanenv.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
**判定**:
|
||||||
|
- pool big > baseline → **プール不足(確保過多)**
|
||||||
|
- pool small < baseline → **プール不足(メモリ不足)**
|
||||||
|
- pool default = baseline → **pool size neutral**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 測定時間見積もり
|
||||||
|
|
||||||
|
| ケース | 実施数 | 時間/実施 | 合計 |
|
||||||
|
|--------|--------|----------|------|
|
||||||
|
| A (C6/C7) | 2×3=6 | 2 min | 12 min |
|
||||||
|
| B (inline) | 2×3=6 | 2 min | 12 min |
|
||||||
|
| C (LIFO) | 2×3=6 | 2 min | 12 min |
|
||||||
|
| D (pool) | 3×3=9 | 2 min | 18 min |
|
||||||
|
| **合計** | - | - | **54 min** |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 判定マトリクス
|
||||||
|
|
||||||
|
| ケース | 結果 | 判定 | 次アクション |
|
||||||
|
|--------|------|------|-------------|
|
||||||
|
| A | C6 > 52M, C7 低 | C7 が制限 | Phase 93: C7 最適化 |
|
||||||
|
| B | -inline > 50M | Inline 段階的 OFF | Phase 94: Inline review |
|
||||||
|
| C | LIFO > +1% | LIFO 推奨 | Phase 92b: LIFO 展開 |
|
||||||
|
| D | pool_big > +2% | 確保が重い | Phase 95: Pool tuning |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 記録フォーマット
|
||||||
|
|
||||||
|
結果は下記フォーマットで PHASE92_TCMALLOC_GAP_RESULTS.txt に記録:
|
||||||
|
|
||||||
|
```
|
||||||
|
=== Phase 92 Triage Results ===
|
||||||
|
Baseline (51.36M): [ENTER CONTROL VALUE]
|
||||||
|
|
||||||
|
ケース A (C6 vs C7):
|
||||||
|
C6-only: [VALUE] ops/s
|
||||||
|
C7-only: [VALUE] ops/s
|
||||||
|
判定: [CONCLUSION]
|
||||||
|
|
||||||
|
ケース B (Inline vs Unified):
|
||||||
|
No-inline: [VALUE] ops/s
|
||||||
|
Unified-only: [VALUE] ops/s
|
||||||
|
判定: [CONCLUSION]
|
||||||
|
|
||||||
|
ケース C (LIFO vs FIFO):
|
||||||
|
LIFO: [VALUE] ops/s
|
||||||
|
FIFO: [VALUE] ops/s
|
||||||
|
判定: [CONCLUSION]
|
||||||
|
|
||||||
|
ケース D (Pool sizing):
|
||||||
|
Pool-big: [VALUE] ops/s
|
||||||
|
Pool-small: [VALUE] ops/s
|
||||||
|
Pool-default: [VALUE] ops/s
|
||||||
|
判定: [CONCLUSION]
|
||||||
|
|
||||||
|
=== FINAL VERDICT ===
|
||||||
|
Primary bottleneck: [A|B|C|D|MIXED]
|
||||||
|
Next phase: Phase 9x [recommendation]
|
||||||
|
```
|
||||||
|
|
||||||
100
docs/analysis/SSOT_BUILD_MODES.md
Normal file
100
docs/analysis/SSOT_BUILD_MODES.md
Normal file
@ -0,0 +1,100 @@
|
|||||||
|
# SSOT Build Modes: Standard / FAST / OBSERVE の役割定義
|
||||||
|
|
||||||
|
## 目的
|
||||||
|
|
||||||
|
ベンチマーク測定において、**ビルドモード**と**測定モード**を分離し、
|
||||||
|
各フェーズで何を測定するかを明確化する。
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3つのモード
|
||||||
|
|
||||||
|
### 1. **Standard Build** (`-DNDEBUG`)
|
||||||
|
- **役割**: 本番相当、最適化最大
|
||||||
|
- **使用**: Phase 89+ 本格 SSOT(A/B テスト、GO/NO-GO 判定)
|
||||||
|
- **スクリプト**: `scripts/run_mixed_10_cleanenv.sh`
|
||||||
|
- **出力**: Throughput(最終スコア)
|
||||||
|
- **特性**: LTO, -O3, frame-pointer 削除、統計安定性:CV < 2%
|
||||||
|
|
||||||
|
### 2. **FAST Build** (`HAKMEM_BENCH_FAST_MODE=1`)
|
||||||
|
- **役割**: 最大パフォーマンス引き出し(PGO、キャッシュ最適化)
|
||||||
|
- **使用**: 性能天井確認、設計上限検証
|
||||||
|
- **スクリプト**: `scripts/run_mixed_fast_pgo_ssot.sh`(要作成)
|
||||||
|
- **出力**: Throughput(ceiling reference)
|
||||||
|
- **特性**: Profile-Guided Optimization, aggressive inlining
|
||||||
|
|
||||||
|
### 3. **OBSERVE Build**
|
||||||
|
- **役割**: 経路確認、フローダンプ
|
||||||
|
- **使用**: ENV ドリフト検出、設定妥当性確認
|
||||||
|
- **スクリプト**: `scripts/run_mixed_observe_ssot.sh`
|
||||||
|
- **出力**: 詳細統計(inline slots 活動、unified cache hit/miss、legacy fallback 呼び出し)
|
||||||
|
- **特性**: メトリクス収集、診断情報
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## SSOT 測定手順(標準パターン)
|
||||||
|
|
||||||
|
### 流れ
|
||||||
|
|
||||||
|
```
|
||||||
|
1. OBSERVE (diagnosis)
|
||||||
|
→ 経路が正しいか確認(「LEGACY used AND C6 INLINE SLOTS ACTIVE」の判定)
|
||||||
|
→ ENV 設定ドリフトを検出
|
||||||
|
|
||||||
|
2. Standard SSOT (control + treatment)
|
||||||
|
→ IFL=0 (control) 10-run
|
||||||
|
→ IFL=1 (treatment) 10-run
|
||||||
|
→ 統計的に有意な差があるか判定
|
||||||
|
|
||||||
|
3. if NO-GO → FAST build で ceiling 確認
|
||||||
|
→ design は correct か、implementation は correct か の切り分け
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 各モードの環境管理
|
||||||
|
|
||||||
|
### Standard
|
||||||
|
```bash
|
||||||
|
HAKMEM_BENCH_MIN_SIZE=16 HAKMEM_BENCH_MAX_SIZE=1040
|
||||||
|
HAKMEM_BENCH_C5_ONLY=0 HAKMEM_BENCH_C6_ONLY=0 HAKMEM_BENCH_C7_ONLY=0
|
||||||
|
HAKMEM_PROFILE=MIXED_TINYV3_C7_SAFE
|
||||||
|
```
|
||||||
|
|
||||||
|
### FAST(将来)
|
||||||
|
```bash
|
||||||
|
HAKMEM_BENCH_FAST_MODE=1
|
||||||
|
HAKMEM_PROFILE=MIXED_TINYV3_C7_FAST_PGO (要定義)
|
||||||
|
```
|
||||||
|
|
||||||
|
### OBSERVE
|
||||||
|
```bash
|
||||||
|
# Standard + diagnostic metrics
|
||||||
|
HAKMEM_UNIFIED_CACHE_STATS_COMPILED=1
|
||||||
|
HAKMEM_INLINE_SLOTS_OVERFLOW_STATS=1
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## GO/NO-GO 判定基準
|
||||||
|
|
||||||
|
| 指標 | 基準 | 判定 |
|
||||||
|
|------|------|------|
|
||||||
|
| 改善度 | ≥ +1.0% | GO |
|
||||||
|
| CV(変動係数) | < 3% | 統計安定 |
|
||||||
|
| 回帰 | < -1.0% | NO-GO(重大) |
|
||||||
|
| 観測スコア | baseline × 1.018 以上 | strong GO |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 参考:Phase 91 (C6 IFL) の例
|
||||||
|
|
||||||
|
**OBSERVE 結果**:
|
||||||
|
- 経路確認:✓ LEGACY used AND inline slots active
|
||||||
|
- スコア:51.47M ops/s
|
||||||
|
|
||||||
|
**Standard SSOT 結果**:
|
||||||
|
- Control (IFL=0):52.05M ops/s, CV 1.2%
|
||||||
|
- Treatment (IFL=1):52.25M ops/s, CV 1.5%
|
||||||
|
- 改善度:+0.38%
|
||||||
|
- 判定:NEUTRAL(目標未達)→ NO-GO
|
||||||
10
hakmem.d
10
hakmem.d
@ -122,6 +122,7 @@ hakmem.o: core/hakmem.c core/hakmem.h core/hakmem_build_flags.h \
|
|||||||
core/box/../front/../box/../front/../box/tiny_c4_inline_slots_env_box.h \
|
core/box/../front/../box/../front/../box/tiny_c4_inline_slots_env_box.h \
|
||||||
core/box/../front/../box/../front/../box/../hakmem_build_flags.h \
|
core/box/../front/../box/../front/../box/../hakmem_build_flags.h \
|
||||||
core/box/../front/../box/../front/../box/tiny_c5_inline_slots_env_box.h \
|
core/box/../front/../box/../front/../box/tiny_c5_inline_slots_env_box.h \
|
||||||
|
core/box/../front/../box/../front/../box/tiny_inline_slots_overflow_stats_box.h \
|
||||||
core/box/../front/../box/tiny_c5_inline_slots_env_box.h \
|
core/box/../front/../box/tiny_c5_inline_slots_env_box.h \
|
||||||
core/box/../front/../box/../front/tiny_c5_inline_slots.h \
|
core/box/../front/../box/../front/tiny_c5_inline_slots.h \
|
||||||
core/box/../front/../box/../front/../box/tiny_c5_inline_slots_env_box.h \
|
core/box/../front/../box/../front/../box/tiny_c5_inline_slots_env_box.h \
|
||||||
@ -142,6 +143,9 @@ hakmem.o: core/hakmem.c core/hakmem.h core/hakmem_build_flags.h \
|
|||||||
core/box/../front/../box/tiny_inline_slots_fixed_mode_box.h \
|
core/box/../front/../box/tiny_inline_slots_fixed_mode_box.h \
|
||||||
core/box/../front/../box/tiny_inline_slots_switch_dispatch_box.h \
|
core/box/../front/../box/tiny_inline_slots_switch_dispatch_box.h \
|
||||||
core/box/../front/../box/tiny_inline_slots_switch_dispatch_fixed_box.h \
|
core/box/../front/../box/tiny_inline_slots_switch_dispatch_fixed_box.h \
|
||||||
|
core/box/../front/../box/tiny_c6_inline_slots_ifl_env_box.h \
|
||||||
|
core/box/../front/../box/tiny_c6_inline_slots_ifl_tls_box.h \
|
||||||
|
core/box/../front/../box/tiny_c6_intrusive_freelist_box.h \
|
||||||
core/box/../front/../box/tiny_front_cold_box.h \
|
core/box/../front/../box/tiny_front_cold_box.h \
|
||||||
core/box/../front/../box/tiny_layout_box.h \
|
core/box/../front/../box/tiny_layout_box.h \
|
||||||
core/box/../front/../box/tiny_hotheap_v2_box.h \
|
core/box/../front/../box/tiny_hotheap_v2_box.h \
|
||||||
@ -184,6 +188,7 @@ hakmem.o: core/hakmem.c core/hakmem.h core/hakmem_build_flags.h \
|
|||||||
core/box/../front/../box/tiny_metadata_cache_env_box.h \
|
core/box/../front/../box/tiny_metadata_cache_env_box.h \
|
||||||
core/box/../front/../box/hakmem_env_snapshot_box.h \
|
core/box/../front/../box/hakmem_env_snapshot_box.h \
|
||||||
core/box/../front/../box/tiny_unified_cache_fastapi_env_box.h \
|
core/box/../front/../box/tiny_unified_cache_fastapi_env_box.h \
|
||||||
|
core/box/../front/../box/tiny_inline_slots_overflow_stats_box.h \
|
||||||
core/box/../front/../box/tiny_ptr_convert_box.h \
|
core/box/../front/../box/tiny_ptr_convert_box.h \
|
||||||
core/box/../front/../box/tiny_front_stats_box.h \
|
core/box/../front/../box/tiny_front_stats_box.h \
|
||||||
core/box/../front/../box/free_path_stats_box.h \
|
core/box/../front/../box/free_path_stats_box.h \
|
||||||
@ -415,6 +420,7 @@ core/box/../front/../box/../front/../box/tiny_c3_inline_slots_env_box.h:
|
|||||||
core/box/../front/../box/../front/../box/tiny_c4_inline_slots_env_box.h:
|
core/box/../front/../box/../front/../box/tiny_c4_inline_slots_env_box.h:
|
||||||
core/box/../front/../box/../front/../box/../hakmem_build_flags.h:
|
core/box/../front/../box/../front/../box/../hakmem_build_flags.h:
|
||||||
core/box/../front/../box/../front/../box/tiny_c5_inline_slots_env_box.h:
|
core/box/../front/../box/../front/../box/tiny_c5_inline_slots_env_box.h:
|
||||||
|
core/box/../front/../box/../front/../box/tiny_inline_slots_overflow_stats_box.h:
|
||||||
core/box/../front/../box/tiny_c5_inline_slots_env_box.h:
|
core/box/../front/../box/tiny_c5_inline_slots_env_box.h:
|
||||||
core/box/../front/../box/../front/tiny_c5_inline_slots.h:
|
core/box/../front/../box/../front/tiny_c5_inline_slots.h:
|
||||||
core/box/../front/../box/../front/../box/tiny_c5_inline_slots_env_box.h:
|
core/box/../front/../box/../front/../box/tiny_c5_inline_slots_env_box.h:
|
||||||
@ -435,6 +441,9 @@ core/box/../front/../box/../front/../box/tiny_c3_inline_slots_env_box.h:
|
|||||||
core/box/../front/../box/tiny_inline_slots_fixed_mode_box.h:
|
core/box/../front/../box/tiny_inline_slots_fixed_mode_box.h:
|
||||||
core/box/../front/../box/tiny_inline_slots_switch_dispatch_box.h:
|
core/box/../front/../box/tiny_inline_slots_switch_dispatch_box.h:
|
||||||
core/box/../front/../box/tiny_inline_slots_switch_dispatch_fixed_box.h:
|
core/box/../front/../box/tiny_inline_slots_switch_dispatch_fixed_box.h:
|
||||||
|
core/box/../front/../box/tiny_c6_inline_slots_ifl_env_box.h:
|
||||||
|
core/box/../front/../box/tiny_c6_inline_slots_ifl_tls_box.h:
|
||||||
|
core/box/../front/../box/tiny_c6_intrusive_freelist_box.h:
|
||||||
core/box/../front/../box/tiny_front_cold_box.h:
|
core/box/../front/../box/tiny_front_cold_box.h:
|
||||||
core/box/../front/../box/tiny_layout_box.h:
|
core/box/../front/../box/tiny_layout_box.h:
|
||||||
core/box/../front/../box/tiny_hotheap_v2_box.h:
|
core/box/../front/../box/tiny_hotheap_v2_box.h:
|
||||||
@ -477,6 +486,7 @@ core/box/../front/../box/tiny_front_hot_box.h:
|
|||||||
core/box/../front/../box/tiny_metadata_cache_env_box.h:
|
core/box/../front/../box/tiny_metadata_cache_env_box.h:
|
||||||
core/box/../front/../box/hakmem_env_snapshot_box.h:
|
core/box/../front/../box/hakmem_env_snapshot_box.h:
|
||||||
core/box/../front/../box/tiny_unified_cache_fastapi_env_box.h:
|
core/box/../front/../box/tiny_unified_cache_fastapi_env_box.h:
|
||||||
|
core/box/../front/../box/tiny_inline_slots_overflow_stats_box.h:
|
||||||
core/box/../front/../box/tiny_ptr_convert_box.h:
|
core/box/../front/../box/tiny_ptr_convert_box.h:
|
||||||
core/box/../front/../box/tiny_front_stats_box.h:
|
core/box/../front/../box/tiny_front_stats_box.h:
|
||||||
core/box/../front/../box/free_path_stats_box.h:
|
core/box/../front/../box/free_path_stats_box.h:
|
||||||
|
|||||||
@ -10,6 +10,22 @@ ws=${WS:-400}
|
|||||||
runs=${RUNS:-10}
|
runs=${RUNS:-10}
|
||||||
bin=${BENCH_BIN:-./bench_random_mixed_hakmem}
|
bin=${BENCH_BIN:-./bench_random_mixed_hakmem}
|
||||||
|
|
||||||
|
# SSOT header: bin sha / profile / iters / ws / runs
|
||||||
|
echo "[SSOT-HEADER] bin=$(sha256sum "${bin}" | cut -c1-8) profile=${profile} iters=${iters} ws=${ws} runs=${runs}"
|
||||||
|
|
||||||
|
# Bench size range SSOT (bench_random_mixed.c reads these).
|
||||||
|
# IMPORTANT: we FORCE these to avoid leaked exports causing "wrong classes exercised"
|
||||||
|
# (e.g. only <=256B => C4/C5/C6 inline-slots never invoked).
|
||||||
|
ssot_min_size=${SSOT_MIN_SIZE:-16}
|
||||||
|
ssot_max_size=${SSOT_MAX_SIZE:-1040} # matches bench default (16..1040 ≒ 16..1024)
|
||||||
|
export HAKMEM_BENCH_MIN_SIZE="${ssot_min_size}"
|
||||||
|
export HAKMEM_BENCH_MAX_SIZE="${ssot_max_size}"
|
||||||
|
|
||||||
|
# Disable fixed-size bench modes (must be forced to avoid leaks).
|
||||||
|
export HAKMEM_BENCH_C5_ONLY=0
|
||||||
|
export HAKMEM_BENCH_C6_ONLY=0
|
||||||
|
export HAKMEM_BENCH_C7_ONLY=0
|
||||||
|
|
||||||
# Keep profiles reproducible even if user exported env vars.
|
# Keep profiles reproducible even if user exported env vars.
|
||||||
case "${profile}" in
|
case "${profile}" in
|
||||||
MIXED_TINYV3_C7_BALANCED)
|
MIXED_TINYV3_C7_BALANCED)
|
||||||
@ -53,6 +69,11 @@ export HAKMEM_TINY_INLINE_SLOTS_FIXED=${HAKMEM_TINY_INLINE_SLOTS_FIXED:-1}
|
|||||||
# NOTE: Phase 80-1 winner (Switch dispatch for inline slots, removes if-chain comparisons)
|
# NOTE: Phase 80-1 winner (Switch dispatch for inline slots, removes if-chain comparisons)
|
||||||
export HAKMEM_TINY_INLINE_SLOTS_SWITCHDISPATCH=${HAKMEM_TINY_INLINE_SLOTS_SWITCHDISPATCH:-1}
|
export HAKMEM_TINY_INLINE_SLOTS_SWITCHDISPATCH=${HAKMEM_TINY_INLINE_SLOTS_SWITCHDISPATCH:-1}
|
||||||
|
|
||||||
|
if [[ "${HAKMEM_BENCH_HEADER_LOG:-1}" == "1" ]]; then
|
||||||
|
sha="$(git rev-parse --short HEAD 2>/dev/null || echo unknown)"
|
||||||
|
echo "[SSOT] sha=${sha} bin=${bin} profile=${profile} iters=${iters} ws=${ws} runs=${runs} size=${ssot_min_size}..${ssot_max_size}" >&2
|
||||||
|
fi
|
||||||
|
|
||||||
if [[ "${HAKMEM_BENCH_ENV_LOG:-0}" == "1" ]]; then
|
if [[ "${HAKMEM_BENCH_ENV_LOG:-0}" == "1" ]]; then
|
||||||
if [[ -x ./scripts/bench_env_banner.sh ]]; then
|
if [[ -x ./scripts/bench_env_banner.sh ]]; then
|
||||||
./scripts/bench_env_banner.sh >&2 || true
|
./scripts/bench_env_banner.sh >&2 || true
|
||||||
|
|||||||
47
scripts/run_mixed_observe_ssot.sh
Executable file
47
scripts/run_mixed_observe_ssot.sh
Executable file
@ -0,0 +1,47 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
# Single-run OBSERVE helper for "is the path actually executed?" checks.
|
||||||
|
#
|
||||||
|
# This script is intentionally NOT a throughput SSOT runner.
|
||||||
|
# It is a pre-flight: verify route/banner + per-class counters + stats are non-zero.
|
||||||
|
#
|
||||||
|
# Usage:
|
||||||
|
# ./scripts/run_mixed_observe_ssot.sh
|
||||||
|
# WS=400 ITERS=20000000 ./scripts/run_mixed_observe_ssot.sh
|
||||||
|
#
|
||||||
|
# Requires: `make bench_random_mixed_hakmem_observe`
|
||||||
|
|
||||||
|
profile=${HAKMEM_PROFILE:-MIXED_TINYV3_C7_SAFE}
|
||||||
|
iters=${ITERS:-20000000}
|
||||||
|
ws=${WS:-400}
|
||||||
|
bin=${BENCH_BIN:-./bench_random_mixed_hakmem_observe}
|
||||||
|
|
||||||
|
# SSOT header: bin sha / profile / iters / ws
|
||||||
|
echo "[SSOT-HEADER] bin=$(sha256sum "${bin}" | cut -c1-8) profile=${profile} iters=${iters} ws=${ws} mode=OBSERVE"
|
||||||
|
|
||||||
|
# Force the same size range as SSOT to avoid class distribution drift.
|
||||||
|
export HAKMEM_BENCH_MIN_SIZE=${SSOT_MIN_SIZE:-16}
|
||||||
|
export HAKMEM_BENCH_MAX_SIZE=${SSOT_MAX_SIZE:-1040}
|
||||||
|
export HAKMEM_BENCH_C5_ONLY=0
|
||||||
|
export HAKMEM_BENCH_C6_ONLY=0
|
||||||
|
export HAKMEM_BENCH_C7_ONLY=0
|
||||||
|
|
||||||
|
# One-shot route configuration banner (Phase 70-1).
|
||||||
|
export HAKMEM_ROUTE_BANNER=1
|
||||||
|
|
||||||
|
# Keep cleanenv defaults aligned with the main runner for knobs that affect control flow.
|
||||||
|
export HAKMEM_WARM_POOL_SIZE=${HAKMEM_WARM_POOL_SIZE:-16}
|
||||||
|
export HAKMEM_TINY_C4_INLINE_SLOTS=${HAKMEM_TINY_C4_INLINE_SLOTS:-1}
|
||||||
|
export HAKMEM_TINY_C5_INLINE_SLOTS=${HAKMEM_TINY_C5_INLINE_SLOTS:-1}
|
||||||
|
export HAKMEM_TINY_C6_INLINE_SLOTS=${HAKMEM_TINY_C6_INLINE_SLOTS:-1}
|
||||||
|
export HAKMEM_TINY_INLINE_SLOTS_FIXED=${HAKMEM_TINY_INLINE_SLOTS_FIXED:-1}
|
||||||
|
export HAKMEM_TINY_INLINE_SLOTS_SWITCHDISPATCH=${HAKMEM_TINY_INLINE_SLOTS_SWITCHDISPATCH:-1}
|
||||||
|
export HAKMEM_TINY_INLINE_SLOTS_SWITCHDISPATCH_FIXED=${HAKMEM_TINY_INLINE_SLOTS_SWITCHDISPATCH_FIXED:-0}
|
||||||
|
|
||||||
|
if [[ "${HAKMEM_BENCH_HEADER_LOG:-1}" == "1" ]]; then
|
||||||
|
sha="$(git rev-parse --short HEAD 2>/dev/null || echo unknown)"
|
||||||
|
echo "[OBSERVE] sha=${sha} bin=${bin} profile=${profile} iters=${iters} ws=${ws} size=${HAKMEM_BENCH_MIN_SIZE}..${HAKMEM_BENCH_MAX_SIZE}" >&2
|
||||||
|
fi
|
||||||
|
|
||||||
|
HAKMEM_PROFILE="${profile}" "${bin}" "${iters}" "${ws}" 1
|
||||||
Reference in New Issue
Block a user